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ABSTRACT. Machine learning techniques are utilised in several areas of as- 
trophysical research today. This dissertation addresses the application of ML 
techniques to two classes of problems in astrophysics, namely, the analysis of in- 
dividual astronomical phenomena over time and the automated, simultaneous 
analysis of thousands of objects in large optical sky surveys. Specifically inves- 
tigated are (1) techniques to approximate the precise orbits of the satellites of 
Jupiter and Saturn given Earth-based observations as well as (2) techniques to 
quickly estimate the distances of quasars observed in the Sloan Digital Sky Sur- 
vey. Learning methods considered include genetic algorithms, particle swarm 
optimisation, artificial neural networks, and radial basis function networks. 

This first part of this dissertation demonstrates that GAs and PSO can 
both be efficiently used to model functions that are highly non-linear in several 
dimensions. It is subsequently demonstrated in the second part that ANNs 
and RBFNs can be used as effective predictors of spectroscopic redshift given 
accurate photometry, especially in combination with other learning-based ap- 
proaches described in the literature. Careful application of these and other 
ML techniques to problems in astronomy and astrophysics will contribute to a 
better understanding of stellar evolution, binary star systems, cosmology, and 
the large-scale structure of the universe. 
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CHAPTER 1 



Introduction: Astrophysical and Astronomical 

Applications 

1.1. Motivation 

1.1.1. Modelling Astrophysical Phenomena. One basic problem that has 
always confronted astrophysicists is that of coming to understand the physical dy- 
namics behind a set of astronomical observations over time, i.e., trying to learn 
what causes the behaviour we see from our telescopes on Earth. On a superficial 
level, however, we often observe behaviour that is at best indirectly related to the 
actual dynamics of a system, or even seemingly counterintuitive. For example, we 
might observe a single pulsating beam of light in the sky, progressing from bright 
to very bright to dim to very bright, and so on: what we are actually seeing is two 
stars of unequal magnitude rotating around each other in our line of sight, eclipsing 
one another in turn as they progress. 

Astrophysical modelling allows us to take theories of expected astrophysical 
behaviour, and, combining them with order-of-magnitude estimates, apply them 
to a particular phenomenon to determine whether our observations are consistent 
with our expectations. When we understand precisely the physical behaviour in 
an observed system (e.g., the simple orbit of a moon around a planet), we are able 
to determine specific parameters of the system (say, the elements that completely 
specify the moon's orbit) by modelling on our observations. 

1.1.2. Data Mining in Large Astronomical Sky Surveys. A newer prob- 
lem beginning to confront astronomers is the need to analyse and make sense of 
data obtained in large sky surveys such as the Sloan Digital Sky Survey (SDSS) 
|47j . the 2dF QSO Redshift Survey (2QZ), and the forthcoming Panoramic Survey 
Telescope and Rapid Response System (Pan-STARRS) and Large Synoptic Survey 
Telescope (LSST). Once in operation, these newer surveys will collect terabytes [45 
of data each night, orders of magnitude more than could be analysed by any team 
using traditional techniques. 

xi 



xii 1. INTRODUCTION: ASTROPHYSICAL AND ASTRONOMICAL APPLICATIONS 

In order to conduct useful science on these tera- and petascale survey databases, 
more efficient and accurate data mining techniques are needed — especially those 
that require little or no human intervention. As survey databases continue to grow 
by orders of magnitude, our analysis techniques will have to keep pace, and machine 
learning techniques are particularly well suited for the task. 



1.2. Traditional vs. Learning-Based Approaches 

Learning algorithms may be preferred for their accuracy, efficiency, and/or scal- 
ability. For example, linear or quadratic polynomial fitting or nonlinear regression 
may be straightforwardly used to model a system or estimate a relation Q but can 
often lead to "large systematic deviations" [42] . A nonlinear solution estimated 
with a learning technique may be harder for humans to make sense of (as with 
'black box' techniques), but can be arbitrarily more complex — and thus sometimes 
arbitrarily more precise. 

The traditional method of photometric redshift estimation before learning tech- 
niques were introduced had been spectral energy distribution (SED) template- 
fitting: a theoretical composite SED is developed by averaging the spectra of hun- 
dreds or thousands of objects that are thought to be parametrically similar (e.g., 
high-z quasars, Seyfert galaxies, or Type la supernovae). Absorption and emission 
lines are identified and shifted into place, and the resulting SED is then redshifted 
to several values to form a set of templates 19j . Photometric measurements are 
then compared to these templates, and a best-fit redshift is determined by min- 
imisation over some chi-square distribution. Unfortunately, this method requires a 
set of representative templates, and cannot be reasonably utilised for astronomical 
objects that cannot be safely classified^] (e.g., quasars, which were until recently 
not well understood, and which are difficult enough to distinguish from stars in 
colour-colour space |31j ). 

Particularly useful also is the ability to adjust a learning algorithm to suit a 
particular problem-specific need (say, for efficiency over accuracy, or for distributed 
computation over an unknown number of systems), which is often not possible with 
traditional statistical approaches. 



1 CL 03] and especially [21]. 
2 Cf. [27] 



1.3. PROJECT AIMS AND DISSERTATION STRUCTURE xiii 

1.3. Project Aims and Dissertation Structure 

The aims of this project are two-fold: we aim firstly to demonstrate the ap- 
plicability of learning techniques to two subsets of current astrophysical problems, 
and secondly to compare the advantages and disadvantages of each learning method 
considered. It is hoped that interdisciplinary investigations such as this will help 
further cooperation between those with domain-specific knowledge of astrophysical 
problems and those with a grasp of the theoretical foundations of machine learning 
techniques — a cooperation that is becoming increasingly important for the future 
of astrophysical research. 

This dissertation is divided into two parts: Part I considers astrophysical mod- 
elling of orbiting satellites, and Part II considers the problem of photometric rcdshift 
estimation of quasars. Within Part I, Chapter 2 deals with the fundamentals of or- 
bital mechanics and the calculation of satellite position according to Kepler's laws. 
Chapter 3 discusses the genetic algorithm approach to this modelling problem, and 
Chapter 4 compares this to an approach based on particle swarm optimisation. In 
Part II, Chapter 5 presents the topic of rcdshift estimation of quasars. Chapters 6 
and 7 discuss artificial neural networks and radial basis function networks as tech- 
niques for redshift estimation and present analyses of the results, and Chapter 8 is 
discussion. Finally, by way of conclusion, Chapter 9 elaborates on the implications 
of this research and on possible future directions. 



Part I 



Astrophysical Modelling 



CHAPTER 2 



Background 

2.1. The Orbits of the Satellites of Jupiter 

The Galilean Satellites of Jupiter, so-called because of their discovery by Galileo 
in 1610, were some of the first subjects of astrophysical modelling. Galileo could 
not directly observe the four moons' orbits around the planet, but he observed 



their change in position around Jupiter over the course of a month (Figure 2.1 1. 
From these observations he was able to induce an elementary theory about what 
was causing the motions of the little 'stars': they were actually orbiting around the 
major planet. This theory had implications that fit the observed evidence, and it 
was deemed to be the most likely explanation for the phenomenon. 

The problem undertaken here is, in principle, to determine all of the six orbital 
elements p. 58] that precisely describe the orbit of a given satellite around 
Jupiter, using only observations available from telescopes on Earth. Given the well 
defined Keplerian theory of celestial mechanics, this problem is essentially one of 
function approximation in six dimensions. As such, it is similar in form to any other 
classical modelling problem in astrophysics, and demonstrates the applicability of 
learning techniques to the approximation of highly parametric models. 

More immediately, however, this solution of approximating celestial orbits can 
be used to study binary star systems, comets with highly eccentric orbits, and 
planetary star systems, as well as astrodynamical problems such as optical or lossy 
radar-based tracking of missiles and satellites. 



2.2. Data 

Accurate ephcmeridcs (calculated positions of astronomical objects) are avail- 
able from the HORIZONS SysterrQ at NASA's Jet Propulsion Laboratory. This 
system was used to simulate observations of Jupiter and one of its moons from a 
ground-based optical telescope at Oxford, England, over a period of 31 days, at 
intervals of 2 hours. For simplicity, observations were simulated during daylight 

^http : //ssd. jpl .nasa. gov/?horizons 
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2.2. DATA 3 
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Figure 2.1. Galileo's observations of Jupiter in Sidereus Nuncius. 
From http : //www. hps . cam. ac . uk/ starry/galileo .html; image 
courtesy of the Master and Fellows of Trinity College, Cambridge. 



as well as at night. The data provided were in the form of right ascension (a) 
and declination (6) coordinates — references to an astronomical coordinate system 
independent of the rotation of the Earth p. 56]. 



4 2. BACKGROUND 

These two-dimensional coordinates essentially show the location of Jupiter and 
one of its moons on an XY-plane, and, together with an arbitrarjj^] distance, are 
sufficient to describe a satellite's orbit precisely over time. However, in order to save 
time and minimise computational rounding errors, we opted to use data describing 
the relative location of a moon to Jupiter in three dimensions. While this is not just 
a simple geometric transformation of two-dimensional data into three-space (it does 
contain more information per se), the techniques should be similarly effective on 
the 2-D data: they would simply take longer to converge in light of the additional 
ambiguity. 

We first used a test dataset with simple orbital elements to optimise the GA 
and PSO parameters, and then we applied the techniques with these parameters to 
actual satellite ephemerides. 



2.3. Classical Orbital Elements 

Six quantities p. 58] are enough, using classical mechanics, to describe the 
orbit of a satellite around a major body: 

(1) a, semi-major axis, half the length of the longest line that can be drawn 
across the orbital ellipse, 

(2) e, eccentricity, varying from (circular orbit) to nearly 1 (almost para- 
bolic) , 

(3) i, inclination, the angle at which the orbital plane is removed from the 
equatorial (fundamental) plane, 

(4) f2, longitude of the ascending node, the angle at which the satellite 'as- 
cends' (in a northerly direction) across the equatorial plane measured 
anticlockwise from the reference direction (7, where a — 0°) when viewed 
from the north side of the major body, 

(5) lu, argument of periapsis, the angle in the orbital plane between the as- 
cending node and periapsis — the point at which the satellite is closest to 
the major body, and 



Using only point coordinates in two-space, it is impossible to calculate certain orbital el- 
ements and target distance simultaneously. The semi-major axis (see j ]2.3| l of the moon's orbit 
would vary linearly with distance. 



2.3. CLASSICAL ORBITAL ELEMENTS 



5 




Figure 2.2. Classical angular orbital elements. From 
http : //en . wikipedia . org/wiki/Orbital_elements. 



(6) M epoc h, mean anomaly at ejwc/ij^j roughly the angle the satellite has trav- 
elled about the centre of the major body at a predefined time epoc/i|^] 
measured anticlockwise from periapsis when viewed from the north. 



See Figure [2T2| for an illustration of the angular elements. In the figure, true anomaly 
v can be thought of as representing mean anomaly at a given time M, although it 



is calculated according to (2.2a I. 



For comparison to mean orbital elements at http://ssd.jpl.nasa.gOv/7sat_elem#jupiter, 
this approach deviates from ]f\: Bate et al. (1971) refer instead to T, the time of periapsis 
passage. M epoc h can be derived from T given the epoch, current time i, current mean anomaly 
M, and lj. 

4 The term epoch is used because orbital relationships slowly deviate (precess) over time, and 
an approximate time frame must therefore be specified. 



6 2. BACKGROUND 

2.4. The Kepler Problem 

The calculation of position in an orbit (in terms of the orbital elements de- 
scribed in S 2.3 1 as a function of time is known as the Kepler problem^] as it requires 
the solution of Kepler's equation [JJ pp. 185, 193]: 

(2.1) M = M epoch + n-At = E-e- sin(E) 

where n is the mean motion^ A^] is the time elapsed since the current epoch, E 
is the eccentric anomaly [7, p. 183], another measure of progression in orbit, and 
M e p OC h and e are as above. 

Beginning with hypothetical values of M epoc h, n, and At that we want to test, 
we approximate E (as described below), and then solve for the true anomaly v and 
distance p, simple polar coordinates representing the position of the satellite about 
the orbital plane: 

(2.2a) v = 2 • arctan (y ^ ^ ' tan — ) 

1 -e 2 

(2.2b) p = a 



1 + e ■ cos(v) 

From v and p, a simple transformation into Cartesian coordinates in 3-space — where 
the x-axis is aligned with the reference direction in the equatorial plane (a = 0°), 
the y-axis is positive at a = 90°, and the z-axis is positive northwards — allows us 
to compare our calculation with our 'observational' data. 

As we cannot isolate the value of E in Kepler's equation (it is 'transcendental' 
in E), we must use a numerical solution to approximate it. This we have done with 
nine iterations (beginning with Eq — M) of = M + e ■ sin(Ei)^ 



The term 'Keplerian problem' is occasionally used, where the Kepler 
problem is taken to mean the two-body problem in classical mechanics; cf. 

http : //en . wikipedia . org/wiki/Keplerian_problem. 

^The mean motion is the average rate of progression of a satellite around its major body, 

yji/a?, where the standard gravitational parameter /x = GM, the gravitational constant times 
the mass of the major body. 

^We deviate from the presentation of Bate et al. (1971) again, for consistency with |6]l above. 

'"'http : / /en . wikipedia . org/wiki/Eccentric_anomaly 



CHAPTER 3 



Genetic Algorithms for Astrophysical Modelling 

3.1. Fundamentals of GAs 

The inspiration for genetic algorithms (GAs) comes from the biological pro- 
cesses of evolution and natural selection [301 p. 222]. In constructing a GA, one 
first initialises a population of randomised organisms, each of which represents 
the set of parameters one wants to approximate. The population is then evolved 
through many generations wherein organisms are allowed to crossover with each 
other, thereby creating offspring organisms that take parameters from both parent 
organisms. 

In each generation, given some definition of fitness, the healthier organisms 
are (probabilistically) more likely to survive, allowing the overall population to 
approach the desired characteristics. A fraction of organisms are mutated in each 
generation as well, to reduce the likelihood of convergence on a local (rather than 



global) minimum. A description of a standard GA is given in Table 3.1 



3.1.1. Advantages of GAs. As GAs are flexible and easily implemented, 
they provide a straight-forward solution to the modelling problem presented here. 
Simple modification of the fitness function would allow us to encourage the GA to 
converge on some orbital elements more strictly than others, if desired; additionally, 
incorporation of error margins into the fitness function would enable us to account 
for measurement errors and systematic deviations in our observational training 
dataQ 

Stochastic elements in the GA induce a "randomised, parallel beam search," 
|29l pp. 252, 259] which is important to keep in mind when approximating highly 
nonlinear functions. In particular, although the GA is not immune to the problem 
of local minima in the error function (equivalent to local maxima in the fitness func- 
tion), it avoids some types of local minima that afflict particle swarm optimisation 
(see Chapter [4]), as its crossover behaviour tends to create hypotheses that are dras- 
tically different from those in the existing population. We also discuss some simple 



^This idea was suggested by Dr Daniel Reichart, Dept of Physics and Astronomy, UNC-CH. 

7 
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GA(Fitness, p, s, m) 

Fitness: Function that calculates accuracy. 
p: Size of the population. 

s: Fraction of the population 'selected' between generations; others are replaced by 

crossover. 

m: Mutation rate. 

Initialise: P <— Randomly generate p organisms. 

Evaluate: For each o in P, find Fitness(o) . 

Create a new generation, Ps: 

(1) Select: Probabilistically select s • p members of P to add to Ps- The 

probability Pr(o) of selecting organism Oi from P is 

Fitness{oj) 
{0l) ~ £? =1 Fitness( 0j ) ' 

(2) Crossover: Probabilistically select 2 p P arrs °f organisms from P ac- 
cording to Pr(o 4 ) above. Crossover each pair, adding offspring to Ps- 

(3) Mutate: Choose m percent of the members of P with uniform probability, 
randomly modifying their parameters slightly. 

(4) Update: P <- P s . 

(5) Evaluate: For each o in P, find Fitness(o). 

(6) Repeat: Repeat if minimum fitness or generation threshold has not been 
reached, or until manually halted. 

Return: Output the organism o in P with the highest fitness. 

Table 3.1. A canonical genetic algorithm. Adapted from Mitchell 
(1997) [29j P- 251]. 



methods of parallelising GAs ([3.4.3), for example, for use on high-performance 



computing clusters for terascale and larger astronomical applications. 

3.2. Implementation of GAs 
The GA was encodecj^] and parameterised as follows. 

3.2.1. Organisms and Population. An organism is simply a data structure 
representation of the six orbital elements described in |2.3| Angular values are 
represented in radians, distance is represented in metres (as a long long) and 
long double^Jare used whenever possible to minimise floating-point errors. Initial 



2 ™ 
Both methods presented here were coded in C++ in a Windows environment, under 

Microsoft® Visual Studio 2008. See Appendix^ for the relevant source. 



3t 



The data type long double is equivalent to double in our C++ implementation. 
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values arc selected uniformly at random from appropriate ranges, and population 
sizes from 100 to 100,000 are tested. 

3.2.2. Crossover and Mutation. Crossover is handled more stochastically 
than in the canonical algorithmic implementation and is similar to a uniform 
crossover [29, pp. 254-255]. First, from the pool of organisms that have been 
selected from the previous generation, two are chosen uniformly at random with 
replacement. A crossover probability c in the interval [0, 1] is defined (in our case as 
0.25), and two children organisms are created such that they are identical copies of 
their parents — except with the independent probability c that any of their orbital 
elements have been swapped. That is, with probability c, the two children have 
swapped their parents' semi-major axis values, and with the same probability, they 
have swapped their parents' inclination values, and so on. 

An overall mutation rate is defined (for us, m — 0.15), and this percentage of 
organisms may be mutated after crossover. With another independent probability 
(again m, = 0.15), an individual orbital parameter is mutated: half of these are 
multiplied by a double selected uniformly from [0.75, 1.25], and half of these are 
completely randomised, as at initialisation. This technique allows some values to 
be slightly adjusted after the population has begun to converge. Also, note that the 
independent mutation rate for an individual parameter is (0.15) 2 = 2.25%, and the 
probability that one of the six parameters is changed is 1 — (1 — 0.0225) 6 = 12.76%, 
a very high ratej^Jwhich we have chosen in light of this problem's rapid convergence 
to local error minima and severe nonlincarities apparent in several dimensions. 

3.2.3. Fitness. Fitness of an organism is determined by calculating the esti- 
mated position of the satellite (given its hypothetical orbital elements) relative to 
Jupiter at every time step ti for which there is an observational datum. The Eu- 
clidean distance Ap(i) between the satellite's estimated position and its observed 
position is calculated: 

(3.1) Ap(i) = y/(x e (i) - x Q (i)) 2 + (y e (z) - y (i)Y + (z e (i) - z (i))* 

and the fitness of the organism is set to the inverse of this distance, ^p(i) , averaged 
over all time steps ti. Thus, organisms that predict satellite positions closer to the 
relevant observations have higher fitness values, and all fitness values are in the set 
M+. 

4 Cf. Mitchell (1997) [29] p. 256], who suggests 0.1%, and Negnevitsky (2002) 30, p. 226], 
who suggests 0.1% to 1.0%. 
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Ap 


Relative Fitness 


Po 


1/Po = fitness Q 


0.75 • po 


1.3 • fitnesso 


0.5 -po 


2 ■ fitnesso 


0.25 - po 


4 • fitnesso 


0.1 - Po 


10 • fitnesso 



Table 3.2. Fitnesses associated with declining Ap values. 



3.2.4. Selection. Before crossover, a predefined percentage (called the re- 
placement or selection rate) of organisms (we use s = 0.7) are probabilistically 
selected according to their fitness values; specifically, an organism o has probability 

/QO > 7 fitness o 

(3.2) probselecto = = — 

l^i£unselected f ltness i 

of being selected to persist in the next generation [29, p. 251], and this selection 
is repeated from the original population (without replacement) until s ■ popSize 
elements have been chosen. Note that, owing to the inverse linear relationship 
established for fitness in |3.2.3[ organisms with lower Ap values are significantly 
more likely to be selected between generations, creating a greedier algorithm, as 
seen in Table l3~2l 



3.2.5. Termination Conditions. One of the difficulties of using GAs and 
PSO is that convergence to global minima in the error function is not guaranteed; 
therefore predefined termination conditions are often difficult to set or even in- 
appropriate, if the likelihood of convergence is not well understood. As such, no 
automated termination conditions were imposed; the algorithm was halted at will 
once useful data had been obtained (typically when 4-10 7 < popSize- generations < 
8- 10 7 ). 

3.3. Results and Analysis 

3.3.1. Artificial Dataset. We first tested our algorithm and observed con- 
vergence rates on an artificial dataset with known orbital elements [a = 10 8 ,e = 
0, i = ir, £1 — (oj + Mepoch) = (mod 27r)] to decide which GA parameters to ap- 
ply to our real data. To analyse the performance of our algorithm, we measured 
best fitness, worst fitness, mean fitness u fitness, and standard deviation of fitness 



o~ fitness over G generations as described in £3.2.5 We did not measure the mean 
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Size 


Best Fitness 


...at Gen. 


Total Gen's 


Runtime per 10 5 Gen's 


100 


4.78825e-05 


40669 


300000 


396.23 s 


1000 


3.1155e-06 


11715 


40000 


6002.8 s 


10000 


1.74912e-06 


345 


2000 


150530 s 



Table 3.3. GA performance on the artificial dataset. We use best 
fitness at the final generation, as in the canonical GA presented in 



Table 3.1 but we include the generation at which this fitness or 
better was first achieved. Note that runtime is related to dataset 



size. 



Size 


a 


e 


i 


n - (w + M epoch ) 


100 


99996000 





3.14154 


0.290308 


1000 


99983731 


1.62e-06 


3.14159 


1.18074 


10000 


100021587 


1.02e-04 


3.1414 


4.7178 



Table 3.4. GA results on the artificial dataset. 



and variance of each of the six orbital elements individually, as we were mainly con- 
cerned with the relationship between fitness, fj,, and a in light of the termination 
problem, not with the algorithm's movement over our particular six-dimensional 
search space. 

We tested GAs with population sizes of 100, 1,000, 10,000, and 100,000, al- 
though populations of 100,000 did not evolve sufficiently quickly on our hardwar^] 
to be useful. A brief overview of best results achieved with these populations is 



presented in Tables 3.3 and 3.4 



Although the GA with population size 100 achieved the best result, it was also 
most susceptible to falling into local minima in the search space. Because the local 
minima for this dataset are well known, GA runs that converged on local solutions 
could be manually excluded. However, on a real dataset for which the search space is 
poorly understood, these local minimum errors cannot be automatically corrected; 
a GA approaching a local minimum is theoretically identical in behaviour to one 



approaching the global minimum, as seen in Figures |3.1[ 3.2 and 3.3 However, 



the lower standard deviation in early generations (early convergence) on the poorer 



Most simulations were run on a 2 GHz Intel® Core 2 Duo processor with 2 GB RAM. 
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Figure 3.1. Overall best fitness of two GAs of size 100 over 
100,000 generations. The dashed-line GA has fallen into a local 
error minimum. 



GA in Figure [33] is possibly an indication of its having fallen into a local minimum, 
judging qualitatively from similar plots such as Figure [3~4| 

GAs with smaller populations have the advantage that their organisms evolve 
more quickly and can therefore arrive at more 'finely tuned' solutions near the global 
minimum without expensive computation time; however, they often fail to gener- 
ate the early variation necessary to approach the global minimum. Accordingly, 
we proceed primarily with GAs of size 1,000, as these were much less susceptible 
to problems of premature convergence, yet still evolved quickly enough to yield 
acceptably precise solutions. 



3.3.2. Jupiter's Moons: Io and Himalia. We first looked to Io, as one of 

the four Galilean satellites with a stable, near-circular orbit. However, because most 
moons with stable circular orbits have an inclination near 0, the dimensionality of 
our search space for Io was reduced — or, rather, our algorithm had to explore a 
very flat search space over the parameters f2, u>, and M epoc h in the region of i — 0. 
We were still able to converge on some of Io's orbital parameters, as illustrated in 
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x 10 



Figure 3.2. fJ, fitness of the two GAs in Figure 3.1 Mean fitness, 
smoothed over 500 generations, tends to approximate the fitness 
of the current (surviving) best organism. 



Element 


GA Approximation 


Actual Value 


Error 


a 


421075000 


421800000 


0.00172 


e 


0.00555 


0.0041 


0.00145 


i 


9.68e-05 


(0.036) 


(0.0057) 



Table 3.5. Results on Io for a GA with population 1,000, af- 
ter 15,000 generations. The actual inclination i is referred to the 
Laplace plane, but for a large moon like Io this is roughly the same 
as Jupiter's equatorial plane. 



Table |3.5| but for the others, the search area around the global minimum was too 
flat for our algorithm to efficiently approach the correct values. 

We then chose Himalia, a smaller moon with a more eccentric orbit and a sig- 
nificant inclination. However, although our GA should have been able to converge 
on more of the orbital elements of Himalia, it became apparent that we would not 
be able to compare our estimated i, f2, lo, or M epoc h values to real data: the values 
of these parameters presented in the literature |24j for all of Jupiter's moons are 
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Figure 3.3. o fitness of the two GAs in Figure 3.1 smoothed over 
500 generations. 



referred to the Laplace plane j^] while our code was designed to calculate orbital ele- 
ments referred to a planet's equatorial plane. Moreover, it is impossible to translate 
between Laplace plane values and equatorial ones without more information about 
the precise orientation of a satellite's orbit in a given epoch. Still, we present our 
best converged values of a and e for Himalia in Table |3.6| 



3.3.3. Saturn's Moon: Atlas. The moons most similar to Jupiter's with 
data that can be referred to a planetary equatorial plane are those of Saturn. Ac- 
cordingly, we chose one of Saturn's major moons, Atlas, for a complete application 
of our learning algorithms. Atlas, like all of Saturn's inner satellites, has a low 
inclination and so has a very flat search space in the region of the global minimum, 
but we can at least compare our GA results to all six of its orbital elements. The 



The orbits of small satellites tend to precess more rapidly than those of larger satellites, 
meaning that the pole of the satellite's orbital plane moves gyroscopically about another pole — 
the pole of its Laplace plane. The Laplace plane thus represents something of an 'average' orbital 
plane if one integrates over the full precession period of a satellite's orbit. 
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IS 




Figure 3.4. af itness of two GAs of size 1,000 over 80,000 genera- 
tions. The dashed line describes a GA that has fallen into a local 
minimum; note its low variance in early generations. 



Element 


GA Approximation 


Actual Value 


Error 


a 


11315960111 


11461000000 


0.0127 


e 


0.100526 


0.1623 


0.0618 



Table 3.6. GA results on Himalia with population 1,000, after 
40,000 generations. Slightly better (2.8% gain in fitness) results 
were achieved with a GA of size 100 after 200,000 generations, 
although it suffered from the local minimum problem described in 
P34| 



orbits of Saturn's smaller, outer satellites, being more inclined and eccentric, are 
not measured in reference to the planet's equatorial planeQ 

The best solution for Atlas's orbital elements found by our GA is presented in 



Table 3.7 We see that the GA converged on very accurate values of a, e, and i, 
but, as expected, was unable to effectively traverse the flat search area around the 
global minimum with respect to the other three parameters. 



^See http : / /ssd. jpl .nasa. gov/?sat_elem#saturn. 
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Element 


GA Approximation 


Actual Value 


Error 


a 


137450238 


137670000 


0.0127 


e 


4.275e-08 


0.0012 


0.0012 


i 


9.470e-08 


0.003 


0.0005 





4.890 


0.0087 


0.777 




3.439 


5.786 


0.374 


^■epoch 


5.441 


2.753 


0.428 



Table 3.7. GA results on Atlas with population 1,000, after 
80,000 generations. 

Figure |3.5| shows the overall best fitness over time along with the mean fitness 
in the population and the standard deviation of fitness (after selection and before 
and after crossover, fj, fitness and <J fitness remain almost unchanged), where the 
latter two values are smoothed over 1,000 generations. In the GA implementation, 
the mean fitness of the population is nearly identical (when averaged over some 
hundreds of generations) to the best fitness observed; the population does not 
periodically decrease in fitness. Note that this does not mean that our algorithm 
behaves greedily; it merely indicates that a single organism attaining a higher fitness 
tends to bring a majority of the population up to its fitness level. This does not 
in itself indicate convergence: the entire population may have the same fitness but 
may still be dispersed about the search space. That increases in best fitness quickly 
bring other organisms to higher fitness levels is also illustrated in Figure |3.6[ where 
the sharp rises and falls of a fitness are consistent with these shifts in fitness. 

3.4. Discussion 

3.4.1. Difficulties of Convergence. As discussed earlier, large moons tend 
to orbit their major planets at very small inclination, such that i ~ (mod it). This 
not only creates a relatively flat search space in the region of i = 0, but also induces 
a dependent relationship between the three other angular orbital parameters. That 
is, Q, to, and M epoc h will have higher fitness when they vary according to the 
relation + lo + M epoc h — t for some value t € [0, 2ir). This dependence effectively 
disallows any significant movement in a single dimension; in order to approach 
optimal values, organisms must cither mutate by very small amounts in single 
dimensions or must mutate in multiple dimensions simultaneously while roughly 
obeying the given relation. 
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Figure 3.5. Log plot of overall best fitness, /J, fitness (dashed 
curve), and u fitness (dotted curve) of a size 1,000 GA running 
on Atlas for 80,000 generations. For clarity, \i fitness is transposed 
downwards by a factor of 2. 

As it is, this dependence tends to bring about convergence of f2, u), and M epoch 
in our implementation on parameter values that are insufficiently optimal. Once 
convergence over these three parameters has occurred, it is very unlikely that the 
GA will approach more optimal values for them. 

3.4.2. Cross- Fertilisation of Populations. We have mentioned that smaller 
populations can approach more optimal solutions in fixed computational time, al- 
though larger populations are better suited to avoiding local minima in the search 
space — particularly in early generations. One approach that could take advantage 
of both of these strengths is that of cross- fertilisation]^] 

Multiple GAs could be run with completely distinct populations of differing 
sizes, such that from time to time some organisms are migrated between popula- 
tions. These organisms could be randomly chosen or could be selected by fitness 



Cf. Mitchell (1997) 1291 p. 268], who discusses this in the context of parallelisation. 
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X 10 

FIGURE 3.6. Linear plot of the GA in Figure [375] overall best fit- 
ness with a fitness- & fitness is smoothed over 1,000 generations, yet 
still indicates great increases in fitness variance over the population 
of 1.000 when the overall best fitness increases. 



(for a greedier implementation) , and they could move between all populations or 
only from larger populations to smaller ones, to avoid local minima traps. 

An attempt at mimicking this cross-fertilisation was made by seeding a GA 
of size 100 with a, e, and i values found by the size-1,000 GA (as though cross- 
fertilising the smaller population with the most fit organism from the larger pop- 



ulation), as listed in Table 3.8 This placed the 100 new organisms in the region 
of the global minimum, and allowed them to explore the search space over 100,000 
additional generations. Fitness improved by 18.8% (not an order of magnitude 
increase), and although the smaller GA still converged relatively quickly on f2, cu, 



and M epoc h as described in £3.4.1 it still achieved more accurate values of these 



three parameters as shown in Table |3.8[ The GA's behaviour is further illustrated 
in Figure |3.7| the lack of subsequent convergence after seeding and after 100,000 
generations could also point to numerical difficulties in calculating precise satellite 
positions along the ephemerides. 
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Element 


GA Approximation 


Actual Value 


Error 


a 


137450238 


137670000 


0.0127 


e 


0.0008922 


0.0012 


0.0003 


i 


9.353e-09 


0.003 


0.0005 





1.157 


0.0087 


0.183 




6.112 


5.786 


0.0519 


^■epoch 


0.217 


2.753 


0.404 



Table 3.8. GA results on Atlas with a seeded population of 100, 
after 100,000 generations. 




x 10 



Figure 3.7. Overall best fitness, \i fitness (dotted curve), and 
c fitness of the size-100 GA after seeding, smoothed over 1,000 
generations. The absence of order-of-magnitude increases in fit- 
ness and the relatively stable & fitness could indicate a numerical 
impossibility of convergence any closer to the global minimum. 

3.4.3. Parallelising GAs. An idea closely related to that of cross- fertilisation 
is that of parallelisation, which aims primarily at distributing the GA for efficient 
computation on decentralised systems. Besides the method of running distinct 
populations on different machines that are then cross- fertilised, one can also create 
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a non-traditional GA that operates not in synchronised generations, but utilises 
one centrally managed population^] 

The population could be kept on one central machine, which would be made 
globally accessible. 'Worker' threads on ancillary machines would request two ran- 
domly selected members of the central population, breed a new organism, and 
suggest it to a manager thread for inclusion if its fitness is higher than the lowest 
fitness in the major population. Probabilistic methods of selection could also be 
used in the inclusion routine, to avoid greediness: when replacing an old organism 
in the population with a newer one, the organism that 'dies' could be selected at 
random with weighting according to fitness, instead of having the manager thread 
deterministically select the least fit organism to be replaced. 

This parallelisation and the cross- fertilisation parallelisation of Mitchell (1997) 
|29l p. 268] could be run asynchronously on cluster setups with any number of 
machines of any speed. Such an implementation could be especially useful for 
solving astrophysical problems without expensive computing resources, as discussed 
m Chapter |] 



This idea is due to Mr Andrew Foster, UNC-CH, who implemented it for an astrophysical 
application. 



CHAPTER 4 



Particle Swarm Optimisation for Astrophysical 

Modelling 

4.1. Fundamentals of PSO 

Particle swarm optimisation (PSO) is another learning technique inspired by 
a physical process — the behaviour of flocks of animals or swarms of insects. An 
implementation of PSO consists of an initially randomised swarm of particles, each 
representing a hypothesis in the search space, wherein each particle has its own 
velocity vector over the set of search parameters. Thus, each particle is 'moving' 
independently about the search space. 

At each step of the algorithm, the velocity of each particle is adjusted to ap- 
proach stochastically the position of the current 'best' particle or the overall 'best' 
particle, for some given fitness function. The algorithm must thus remember the 
overall 'best' particle, but this is minimally memory-intensive. The basic update 
equations are as follows. 

(4.1a) v[i] = v[i] + 2 x rand() x (p curre ntbest - p[i]) 

+ 2 X rand() X (Poverallbest - p[i]) 

(4.1b) p\i] =p[i]+v[i] 

In these equations [26 , p[i] and v[z] represent the position and velocity vectors of 
a particle i in the six-dimensional parameter space; p C urrentbest and Poverallbest are 
the 'best' particles in the swarm, as described above; and randQ is a random real 
number in the interval [0, 1]. 

4.1.1. Advantages of PSO. Without the crossover and mutation operators 
of a GA, PSO has fewer parameters to tweak for any particular application. The 
velocity update feature allows the swarm to accelerate (up to some optionally speci- 
fied maximum velocity v max ) towards minima in the search space, while its velocity 



components themselves, combined with the stochastic element in equation (4.1al, 
allow the algorithm to avoid converging on local minima. The minimal need for 

21 
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centralised information exchange in a PSO algorithm also make the technique well 
suited for distributed processing. 



4.2. Implementation of PSO 

4.2.1. Particles and Swarm. As in the GA implementation, a particle in 
PSO primarily represents the six orbital parameters in a data structure. Addition- 
ally, velocities of the parameters over the search space are represented, with a max- 



imum velocity bound (see ^ 4.2.2.2 1 imposed on five of the parameters. Lower max- 
imum velocities encourage faster convergence and less erratic particle behaviour, 
but they also incline the algorithm more towards early convergence to local minima 
in the parameter space. Swarm sizes are similar to population sizes for the GA 
implementation, i.e., 100 to 100,000. 

4.2.2. Problem- Specific Methods. 

4.2.2.1. Circular Search Dimensions. Since the particles in the PSO implemen- 
tation are 'moving' about the search space with certain velocities, a problem arose 
with the angular value representations of four of the orbital parameters. Since these 
parameters are only valid in the range [0,27r) radians (or rather since a broader 
search space where values are congruent modulo 2tt tends to prevent PSO conver- 
gence), one obvious solution would be to impose 'physical' limits on the particles to 
prevent them from exploring outside of this interval. However, this limiting induces 
the particles to converge prematurely at the edges of the search space: high veloci- 
ties send a few particles near the boundaries, where they halt; the more fit particles 
in these clusters attract other particles to the boundaries, and the convergence 
problem worsens. 

The first solution to this problem is to create a circular search space in the four 
dimensions with angular values. Such a search space has two important properties: 

(1) Particles are allowed to move beyond the interval [0,27r), but their radian 
values are immediately adjusted (mod 2n) to replace them into the interval 
before further calculations; and 

(2) When calculating velocity updates, particles do not simply move towards 
the linear values of the best-fit particles, but travel the shortest distance 
(arc length) around the 0-to-27r radian circle. 

For example, if a particle with position were moving towards a particle with 
position 37r/2, it would not update its velocity with the positive value (37r/2 — 0), 
but, by with the negative value — (2n — (37r/2 — 0)) = — 7r/2, inclining it towards 
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the shorter path around the radian circle. By ([T]), then, its radian value would be 
adjusted, if necessary, to keep it in the interval [0, 2tt). 

4.2.2.2. Maximum Velocities. The methods described in j ]4.2.2.1| are, however, 
not enough to guarantee a simpler PSO convergence for this problem. The following 
example best illustrates the remaining difficulty, and why we impose maximum 
velocities on the particles in five out of six dimensions. 

Suppose the current best and overall best particles are resting at a position of 
radians. A particle with high positive velocity around the circle is attempting 
to approach the 0-radian position, and is currently at tt/2 radians. Its velocity is 
updated, therefore, by adding a small negative value to its current large positive 
velocity. By the next time step, its positive velocity has lessened slightly, but it has 
continued to move past the tt radians position and is still attempting to approach 
the position. At this point, its velocity is updated with a further positive value, 
and it passes the position by the next time step, thus restarting the process and 
preventing convergence. 

One solution to this further convergence problem is to impose maximum abso- 
lute velocities on the particles, meaning that < v max for all particles i in 
each dimension j at all times. Intuitively, a maximum velocity for one of the four 
angular dimensions should be less than tt/2, so we have experimented with v max 
values between 0.1 and 1.5. We have also imposed a maximum velocity of 0.1 on 
the eccentricity (e) parameter in order to avoid the clustering problem described in 



{ 4.2.2.1 Simulations run without maximum velocities yielded both the clustering 
and convergence problems, as expected. No maximum velocity was necessary for 
the semi-major axis (a) parameter, being unbounded (or rather being limited to 
Z+) and not susceptible to either of the convergence problems described aboveQ 

4.2.3. Learning Factors. Learning factors c\ and C2, which control the ve- 
locity updates with respect to the current best and overall best particles, were both 



set to ci = c 2 = 2, as recorded in equation (4.1a). This makes it as likely that 
a particle will move towards the current best particle as towards the overall best 
particle, but reduces the greediness of the algorithm so as to better avoid local 
minima in the error function. 

4.2.4. Dimensionally Independent Velocity Updates. A more stochastic 
movement for particles was tested by assigning independent random coefficients to 



^Some clustering was observed at a = 1, but not to a significant degree. 
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each of the orbital elements in the particles' position vectors: 

(4.2) V[l] = V[i] + 2 X r (g) (p CU rrentbe S t ~ P[«]) 

+ 2 X r <g> (Tp overallbest ~ PW) 

where r is a six-dimensional vector of independently randomised reals in [0, 1] and 
the operator (g> represents element-wise multiplication. However, this modification 
made the algorithm too erratic; it would converge only minimally, although particles 
would occasionally stumble upon high-fitness solutions. Note that, even though the 
statistically expected changes in velocity are the same in each dimension: 

(4.3) E(Av[i])= 2xE(r)®(p currenftest -p[i]) 

+ 2 X E(r) ® (Poverallbest ~ P[«]) 
= 2 X 0.5 X (p currentbest ~ p[*]) + 2 X 0.5 X (Poverallbest ~ PW) 

(P currentbest P[^D (P overallbest P[^D 
= 2 X 0.5 X (p C urrentbest ~ P[«]) + 2 X 0.5 X (Poverallbest ~ p[*]) 

= 2 x E(randQ) x (p curre ntbe S t ~ p[i]) 
+ 2 x E(rand()) x (p aver allbest ~ p[i\) 

the net effect is different because the probability of uniform movement towards one 
'best' particle is significantly lower: 

(4.4) P(randi > rand 2 ) > P(n[l] > r 2 [l] A 

: A 
ri[6]»r 2 [6]), 

where randi G [0, 1] and G [0, l] 6 . 

4.2.5. Fitness and Termination Conditions. Fitness and termination con- 
ditions are as described in § §3.2.3| and |3.2.5| Algorithm runs that were observed to 
converge prematurely were halted immediately. 

4.3. Results and Analysis 

4.3.1. Artificial Dataset. Just as with our GA implementation, we tested 
the PSO technique on the artificial dataset, to begin. We used swarm sizes of 100 
and 1,000, and maximum velocities of 0.1, 0.25, 0.5, and 0.75. As shown in Tables 



4.1 and 4.2 there was no plain correspondence between parameter settings and best 
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Size 


u max 


Best Fitness 


...at Gen. 


Total Gen's 


Runtime/10 5 G's 


100 


0.1 


5 78903e-07 


6657 


80000 


599 88 s 


100 


0.25 


3.14102e-06 


42467 


80000 


606.80 s 


100 


0.5 


1.07012e-07 


76614 


80000 


611.58 s 


100 


0.75 


2.88145e-07 


44341 


80000 


594.45 s 


1000 


0.1 


3.70166e-07 


10886 


20000 


2779.1 s 


1000 


0.25 


4.11467e-07 


8830 


20000 


2858.8 s 


1000 


0.5 


1.11125e-06 


9898 


20000 


2886.8 s 


1000 


0.75 


3.02420e-07 


6137 


20000 


2812.3 s 



Table 4.1. PSO performance on the artificial dataset. Best fitness 



is the overall best fitness observed. 

fitness achieved, but parameters did affect the convergence behaviour of the swarm 
as regards precision around the global minimum and the local minima problem of 

Smaller swarms could run through more generations than larger swarms, as 
with GAs, but many of the PSO runs achieved their best overall fitness at or before 
running through 50% of their total generations. This attests to the very erratic 
behaviour of the particles in the search space; running through more generations 
often did not improve the algorithm's result, although more generations ought the- 
oretically to yield improvement. It was more important that our algorithms did not 
converge prematurely at the beginning of their runs (for which higher v max values 
and larger swarms were better), and that their particles eventually converged once 
in the region of the search space near the global minimum (for which lower v max 
values were better). One possible remedy for this v max dilemma is presented in 

El 

4.3.2. Himalia. PSO was not tested on Jupiter's moon Io, but the algorithm 
achieved results with a 58% improvement in fitness over our GA's results after 
60,000 generations with a swarm of 1,000. However, these improvements do not 
show through, as most of the accuracy increase was apparently in the four orbital 
parameters whose correct values we cannot transform out of the Laplace plane. 
Even still, the error of the a estimate seen in Table [4~3] is noticeably high. 

4.3.3. Atlas. For Saturn's moon Atlas, we first present results from PSO runs 
of size 1,000 and v max between 0.5 and 1.0; smaller swarms and lower v max values 
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Size 


1] 

u max 


a 




% 


a:, lvJ -epocn ) 


100 


0.1 


100033396 


o 


3.14054 


3 85625 


100 


0.25 


100009747 





3.1414 


5.57166 


100 


0.5 


100023939 





3.14632 


4.52465 


100 


0.75 


100030855 





3.1429 


4.03221 


1000 


0.1 


99964288 





3.14294 


2.59303 


1000 


0.25 


100032127 





3.14362 


3.95884 


1000 


0.5 


99992860 





3.14103 


0.52695 


1000 


0.75 


99963548 





3.14104 


2.63425 



Table 4.2. PSO results on the artificial dataset. The e values are 
all nought because no maximum velocity had yet been set on this 



parameter; the clustering problem of §4.2.2. 1| yielded these values. 



Element 


GA Approximation 


Actual Value 


Error 


a 


14763515741 


11461000000 


0.2882 


e 


0.2423 


0.1623 


0.0800 



Table 4.3. PSO results on Himalia with population 1,000 and 
Vmax — 0.5, after 60,000 generations. Other parameter settings 
produced more accurate a and e estimates, but the run presented 
here had the highest overall fitness. 



either fell too quickly into local minima or converged nearly immediately in a flat 
region of the search space very far from the global minimum. We then present 
an analysis of our PSO runs, including a couple runs that fell into local minima, 
and discuss how the successes and failures are distinguishable by their convergence 
behaviour. Then, a remedy for the local minimum convergence problem is presented 



in HA 



Our best PSO results were achieved with a v max setting of 0.5, when the al- 
gorithm did not fall into a local minimum. A fitness slightly less that of our GA 
was achieved, although it was of the same order of magnitude. The approximated 
orbital elements are presented in Table |4.4| We sec from these elements that the 
PSO, like the GA in |3.3.3| had difficulty optimising fitness in the flat search space 
in the vicinity of the global minimum (where a, e, and i are close to the actual 
values) . 
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Element 


GA Approximation 


Actual Value 


Error 


a 


137446613 


137670000 


0.0016 


e 





0.0012 


0.0012 


i 


8.579e-05 


0.003 


0.0029 





3.112 


0.0087 


0.494 




1.682 


5.786 


0.653 


^■epoch 


2.682 


2.753 


0.0113 



Table 4.4. PSO results on Atlas with population 1,000 and 
Vmax = 0.5, after 80,000 generations. 



However, many of our PSO runs did not approach the global minimum, but in- 
stead converged prematurely at the beginning of the run (usually with swarms 
smaller than 1,000 and/or v max < 0.5) or fell into a local minimum, such as 
a = 40635000, i = vr. The premature convergence is easily detected (as variance 
quickly drops to nought), but the local minima problem involves more subtle swarm 
behaviour in our particular search space. We present some observations of this be- 
haviour that may be useful in other search spaces with similar local minima features. 

Particle swarms that have fallen into a local minimum exhibit a couple well 
defined characteristics: (1) when smoothed, ujuness does not show any order-of- 
magnitude increases over the course of the run (often quickly decreasing from its 
initial value), and (2) when smoothed, \x fitness an d (J fitness have a negative corre- 
lation, such that ^fitness + o 'fitness remains roughly constant. On the other hand, 
swarms that approach the global minimum exhibit (3) a sharp drop in /i fitness, (4) 
a sharp rise in a fitness and (5) a positive correlation between the current best value 
and cr fitness, once all values have been smoothed. 



Characteristics (1) and (2) are illustrated in Figure 4.1 the low a fitness value 
probably attests to early convergence in the search space, or at least convergence 
towards a value (e.g., a — I) that produces similar fitnesses. Eventually, a low 
& fitness value is desirable, but not at the outset of the run where diversity of pa- 
rameter values in the swarm is important for a fuller exploration of the search space. 
The negative correlation between fi fitness and (J fitness (2) in failing swarms could 
also attest to the small-scale introduction of diversity in the swarm when a fitness 
rises. This diversity already exists in successful swarms that are approaching the 



global minimum, so there is no strong negative correlation, seen in Figure 4.2 
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5£ 10 4 



Figure 4.1. ^fitness (blue) and a fitness (green) of particle swarms 
with v m ax = [0.5, 1.0] that have fallen into local minima, a fitness 
remains near its initial value, which represents the variance of ran- 
domly initialised orbital elements. 
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FIGURE 4.2. fifuness (blue) and af itnesB (green) of successful par- 
ticle swarms with v max — [0.5,0.75,1.0]. Note the early drop in 
A 4 fitness and the early increase in a fitness, as well as their lack of 
clear correlation. 
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Characteristics (3) and (4), also visible in Figure 4.2 are also due to the early 
introduction of diversity into the successful particle swarms. Surprisingly, (J. fitness 
drops below its initial value (which represented the mean fitness of randomly ini- 
tialised particles); this could indicate a search space feature similar to a steep hy- 
perdimensional Mexican hat function (where high-fitness values are surrounded by 
below-average-fitness values), called the Laplacian of Gaussian function^] Finally, 
the positive correlation between the current best fitness and a fitness (5), seen in 



Figure 4.3 highlights the multiple-order-of-magnitude leaps in the current best fit- 



ness relation (see also Figure 4.5 1 for successful particle swarms; the lack of this 
relation in failing swarms is seen in Figure |4.4| 

As seen in Figure [475] the smaller v max value of 0.5 brings about many more 
high-fitness peaks above the 10 -8 fitness level than do larger v max values, by moving 
around more 'slowly' close to the global minimum. Unfortunately, smaller v max 
values often prematurely converge or fall into local minima, but a solution for this 



dilemma is presented in [4.4 



4.4. Discussion 



As mentioned in [4.3 smaller swarm sizes and smaller values of v max often 
converge prematurely, but smaller values of v max yield more precise solutions if the 
swarm approaches the global minimum, and smaller swarm sizes can sometimes 
be more computationally efficient. In order to leverage the advantages of smaller 
swarms and smaller v max values while avoiding premature convergence, we imple- 
mented a simple, tapered v max setting: this tapered v max is initially very high to 
avoid early convergence and allow the particles to explore the search space, but it 
is gradually reduced over the course of the run so that a very small v max is enforced 
once the swarm has located the region of the global minimum. 

We present three runs: firstly, a swarm of 1,000 with a short taper to 80,000 
generations, secondly, a swarm of 1,000 with a longer taper to 100,000 generations, 



and thirdly, a swarm of 100 with the longer taper to 200,000 generations. Table 4.5 
presents the tapered v max values and their corresponding generation marks. 

All runs of the tapered PSO algorithm (including several not presented here) 
successfully avoided premature convergence, and all approached the global mini- 
mum. Additionally, all achieved significant improvements in fitness over our best 
performing basic PSO run (although still not reaching the fitness of our GA run), 



2 http: //en . wikipedia. org/wiki/Mexican_hat_wavelet 
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Figure 4.3. Current best fitness (blue) and a fitness (green) of 
particle swarms sized 1,000 with v max = [0.5, 0.75, 1.0]. The major 
increases in a fitness correspond to enormous leaps in current best 
fitness values, creating the distinct positive correlation. 
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Figure 4.4. Current best fitness (blue) and a fitness (g reen ) of 
failing particle swarms of size 1,000 with v max = [0.5,1.0]. The 
lack of positive correlation is due to the lack of order-of-magnitude 
leaps in current best fitness that would strongly affect <J fitness- 
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Figure 4.5. Overall best fitness and current best fitness, be- 
fore smoothing, of particle swarms sized 1,000 with v rnax = 
[0.5, 0.75, 1.0]. The first plot is a log plot. 
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11 

u max 


From Gen. (short) 


From Gen. (long) 


1.5 








1.0 


5000 


10000 


0.75 


10000 


20000 


0.5 


15000 


30000 


0.25 


20000 


40000 


0.1 


25000 


50000 


0.05 


30000 


60000 



Table 4.5. Incremental v max values for the tapered PSO algo- 
rithm and the generations at which they arc enforced. 



indicating that the smaller v max values in later generations were effective at in- 
creasing precision in orbital element estimations. Figures 4.6 and 4.7 display the 
familiar characteristics of successful PSO runs. 
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Figure 4.6. Overall best fitness (blue) and /i fitness (green) of the 
tapered PSO runs. fJ, fitness drops from its initial value, as ex- 
pected. 
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X 1D 



FIGURE 4.7. Current best fitness (blue) and Of itness (green) of 
the tapered PSO runs. Note the same clear positive correlation 
between current best fitness and u fitness, as well as the increas- 
ing afitness values over time. Also, no clear negative correlation 



between fJ,f itneas (Figure 4.6) and cr fitness is apparent 



Part II 

Photometric Redshift Estimation 



CHAPTER 5 



Quasar Redshifts in Optical Sky Surveys 

5.1. Quasars and Redshifts 

Quasars, or quasi-stellar objects (QSOs), are thought to be regions of gas sur- 
rounding supermassive black holes at the centres of very distant (> 800 million 
light-years) galaxies. Because quasars are at such great distances, they exhibit very 
high redshifts (z), translations of observations (on the electromagnetic spectrum) 
towards longer wavelengths (A) as a result of the expansion of the universe: 

/ \ ^observed ^expected 

(5.1) z= '- 

■^•expected 

such that l + z represents the ratio of expansion of the universe between the moment 
of the object's light emission and our current time: 

/ r o\ 1 i ^-observed 

(5.2) 1 + z = 

^expected 

That is, if an object is observed at z = 1, then the universe has expanded by a 
factor of 1 + z = 2 since the light reaching us was emitted. 

Accordingly, since distance on cosmological scales is directly correlated with 
redshift, and since quasars are the most distant directly observable objects in the 
universe, it is useful to be able to determine their redshifts accurately and efficiently 
Further study into the properties of quasars at different redshifts will contribute to 
a better understanding of cosmological evolution and the large-scale structure of 
the universe. 

Unfortunately, the most accurate method of redshift determination (using spec- 



troscopy, see ^5.3) is very time-consuming and does not scale to keep up with the 
number of quasars being identified using modern techniques: Ball et al. (to appear) 
state, "the number of spectra available typically lags the number of [photometric] 
images by more than an order of magnitude" [6] . Thus, we present here methods 
of photometric redshift estimation that are hundreds of times more efficient than 
spectroscopic determination and are increasingly consistent with the most accurate 
spectroscopic results. 
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Band 


Colour 


Wavelength 


u 


near-ultraviolet 


3590 A 


9 


green (visible) 


4810 A 


r 


red (visible) 


6230 A 


i 


far-red 


7640 A 


z 


further-red 


9060 A 



Table 5.1. ugriz filters used in the SDSS. 



5.2. The Sloan Digital Sky Survey 

The Sloan Digital Sky Survejj^SDSS) |47] is perhaps the most comprehensive 
sky survey taken to date; it has catalogued nearly 25% of the skjj^Jusing five broad- 
band photometric filters, which allow its telescope to simultaneously measure the 
luminosity of objects at different wavelengths, or in different 'colours'. The SDSS 
uses the u, g, r, i, and z filters]^] which measure light between 3590 A and 9060 A^] 
|4T] as listed in Table |5.1| 5 



5.3. Spectroscopic Redshifts in the SDSS 

Photometric data aside, the SDSS has also captured the electromagnetic spec- 
tra — energy measurements over all 'optical' wavelengths from 3800 A to 9200 A at 
a resolution of approximately A/AA = 1800 |47j ^] — of some 100,000 quasars. These 
spectra yield a great deal of information about the quasars inspected, including 
their chemical contents, temperatures, any intervening gases blocking our line of 
sight in the interstellar medium, and, most importantly for our purposes, their 
spectroscopic redshifts (z spec ). These values are the most accurate estimates of 

1 Funding for the SDSS has been provided by the Alfred P. Sloan Foundation, the Participat- 
ing Institutions, the National Science Foundation, the U.S. Department of Energy, the National 
Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, 
and the Higher Education Funding Council for England. 

2 http : //www . sdss . org/backgr ound/ 

^We use the designations ugriz and u'g'r'i'z' (as found in much of the literature) inter- 
changeably; details of their differences can be found in pQ, |17| . [31] . and especially |37j . Note 
that the ugriz photometric system supersedes the preliminary u*g*r*i*z* system in 1 3 1 1 . 

4 1 angstrom = 0.1 nanometres. 

5 Cf. also [35] and [16) for 'effective wavelength' figures. 
6 Cf. |2[ and [201. 
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rcdshift available, as they make use of data points taken across such a large region 
of the electromagnetic spectrum. 



5.4. Magnitudes and Photometric Colours 

We have noted that the SDSS uses the ugriz photometric system; each of 
these five measurements is a logarithmic measure of flux (F) — a measurement of 
the amount of light emitted by an object per unit of time — around a particular 
wavelength. Since the measurements are logarithmic (e.g., u oc \ogF u ), and since 
log a — log b — log(et/6), their differences represent flux ratios; for example, u — g oc 
log(F u /F g ). It follows from this that the photometric colours u — g, g — r, r — i, 
etc. give us information about the colour proportions of a quasar as observed from 
Earth. We use the notation C xy to denote the colour x — y as in Richards et al. 
(2001b) [32] , 



5.5. Photometric Redshift Estimation 

There are two major contributing factors in photometric redshift (z v hot) es- 
timation: firstly, the location of redshifted spectral features in wavelength space 
relative to the ranges of the broadband filters; and, relatedly, the structure of the 
colour-redshift relation (CZR). 

Independent of redshift, quasar spectra are known to exhibit certain prominent 
emission and absorption lines, notably Mg II, C III, C IV, and Lyman-a |37j . At 
different redshifts, however, these spectral lines are observable at different wave- 
lengths, and so move in and out of the ranges of the ugriz J HK bands as redshift 
increases |31j . For redshifts (z ~ 0.3) at which the Mg II emission line is observable 
in the u band, for example, the colour C ug is much bluer than usual. As Mg II 
moves into the g band at z ~ 0.6, C ug becomes redder while C gr becomes bluer 



(Figure |8.1[). 

However, when these spectral features are not observed in the bandpasses in 
use, there can be significant degeneracy in the CZR in colour-colour space (see 



Figure 8.3 1, whereby the same set of colours corresponds to more than one red- 
shift. Richards et al. (2001b) point out, then, that "z is not strictly a function of 
color" |32j. This degeneracy can lead to systematic errors in z p hot and regions of 
'catastrophic failure' [5] in the z p hot-z S pec relation. 
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Band 


Colour 


Wavelength 


J 


near-infrared 


12350 A 


H 


near-infrared 


16620 A 


K 


near-infrared 


21590 A 



Table 5.2. JHK filters used in 2MASS [20]. 



5.6. SDSS Quasar Dataset 

The data we use are from the third SDSS Quasar Catalog [35 via the Center 
for Astrostatistics at Penn State University^] This dataset primarily comprises 
46,420 quasars measured in ugriz filters with corresponding z spec values, but the 
quasars are cross-referenced with other sky surveys (FIRST [8 , RASS [3], 2MAS^] 
J) when possible. These three additional surveys add matching broadband radio, 



X-ray, and JHK (shortward of z, see Table 5.2 1 magnitudes for 3,757, 2,672, and 
6,192 quasars, respectively. 

The third edition of the quasar catalog, sourced from SDSS Data Release 3 
(DR3), contains ugriz magnitudes that have not been corrected for galactic extinc- 
tion]^] Since the effects of galactic extinction are systematic and not variable, we 
have not performed the extinction corrections on our dataset; training with artificial 
neural networks and radial basis function networks nullifies these systematics j 13] 
if the test data are also uncorrected. Leaving the data as-is also allows us to make 
use of stated one-sigma Gaussiarj 15 ] measurement errors (i.e., photometric noise es- 
timates) for ugriz JHK magnitudes without introducing additional variance for 
extinction corrections. The utility of these photometric measurement errors leads 
us to omit the radio and X-ray measurements that lack stated error bars. 



7 http : //astrostatistics .psu. edu/datasets/SDSS_quasar .html 

^This publication makes use of data products from the Two Micron All Sky Survey, which 
is a joint project of the University of Massachusetts and the Infrared Processing and Analysis 
Center/California Institute of Technology, funded by the National Aeronautics and Space Admin- 
istration and the National Science Foundation. 

"Also called reddening, extinction is the scattering of emitted light in the interstellar medium 

due to the presence of intervening dust and gas particles; in near-optical wavelengths (e.g., ugriz, 

objects appear redder than they should. 

10 The assumption that we can treat stated error bars as following a Gaussian was confirmed 

by Dr Daniel Vanden Berk, Dept of Astronomy and Astrophysics, Penn State University, in private 



communication. This assumption is crucial for determination of output confidence in \ 6.2.1 
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We have arranged the data into five different input sets (with set sizes par- 
enthetical) for training and testing: ugriz (46,420); C ug ,C gr ,C r i,Ci Z (46,420); 
ugriz, C ug , C gr , C r i, Ci Z (46,420); ugriz^i^ (6,192 matched to 2MASS); ugrizJHK 
(6,192); and C ug ,C gr ,C r i,Ci Z ,C z j,CjH,CHK (6,192). For each dataset, we have 
the above 4-9 input magnitudes, corresponding z speci and, for some tests, pho- 
tometric error values PI our aim is to investigate how effectively different machine 
learning techniques can make use of these photometric input data to estimate quasar 
redshifts. 



■'■■'"For errors of colours C X y, we sum the errors a x and cr y in quadrature: ^c my = \j a \ + ° 
following Richards et al. (2001b) [32]. 



CHAPTER 6 



Artificial Neural Networks for Photo- z Estimation 



6.1. Basics of Artificial Neural Networks 

6.1.1. Motivation and Network Structure. The artificial neural network 
(ANN) has as its model the functioning of the human brain [30, p. 166]: the 
basic unit of an ANN is the neuron, which takes a real-valued input and fires — 
outputting or 1 or perhaps a value in (0, 1) — according to an activation function 
(e.g., a Heaviside step function or a sigmoid function) over the input. These neurons 
are arranged into a network architecture, typically layered such that all neurons in 
adjacent layers are connected to each other, output-to-input. 



With this ordered structure, as seen in Figure 6.1 neurons in the middle ('hid- 
den') and output layers have multiple inputs, which are weighted in the training 
stage described below. Additionally, a bias term is added to each neuron's input 
(in the hidden and output layers) before its output is calculated. Thus, an ANN 
with one hidden and one output layer (we call this a two-layer ANN, as in [9j p. 
119]) is completely specified by its network architecture, two matrices ('vectors') 
each of weights and biases, and the activation functions used inside the neurons. 



6.1.2. Network Training and Error Backpropagation. An ANN learns 
by adapting its weight and bias vectors based on a set of training data of input 
vectors and corresponding output values. The goal of the training stage is to adapt 
the weights and biases in the network so as to minimise errors over the training 
examples; however, one may allow for some small errors over the training set in 
order to avoid overfitting (memorising) the training data at the expense of the 
ANN's predictive ability on unseen test sets. 

To properly adjust the weight and bias vectors to fit the training set, ANNs 
are trained with an algorithm known as Backpropagation [29, p. 97], which 
first runs a training input through the ANN, then 'propagates' errors backwards 
through the network by attributing corrections to individual neurons according to 
their responsibility in contributing to the overall error [9, p. 140]. A full description 
of the Backpropagation algorithm can be found in Mitchell (1997) [29l p. 98]. 
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Figure 6.1. The structure of a 2:5:1 artificial neural network. 
From http: // en. wikibooks . org/wiki/Artif icial_Neural_Networks. 

6.1.3. Function Representation in an ANN. It is useful to know how 
many layers and how many neurons we must have in an ANN in order to properly 
represent the non-linear functional relationship between photometric magnitudes 
and redshifts. Bishop (1995) demonstrates that "any given decision boundary can 
be approximated arbitrarily closely by a two-layer network having sigmoidal activa- 
tion functions" [9j p. 126] and a sufficiently large number of neurons in the hidden 
layer ('hidden units'). While this does not tell us how many neurons or digits of 
numerical precision we will require for a particular problem, it docs remind us that, 
in principle, two-layer ANNs are equivalent in representational capability to net- 
works with three or more layers. Accordingly, for simplicity's sake, we will restrict 
our investigations to two-layer networks. 

6.2. Concerns in Redshift Estimation 

6.2.1. The Jacobian Matrix and Output Confidence. In astrophysical 
applications, it is important to know not just the best-estimate value, but also 
to quantify the accuracy of that estimate Q Since we have one-sigma (normally 
distributed) measurement errors (a u , etc.) for each of the ugrizJHK magnitudes 
in the SDSS quasar dataset, we demonstrate how they can be used to derive error 
bars for individual z p i lot outputs in an ANN. 

x Cf. [6] and [321. 
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Given the use of differentiable sigmoid functions in our hidden layer (as op- 
posed to some non-differentiable step functions), we can apply a series of partial 
derivatives to propagate errors for inputs Xi (cr x ., understood in the sense of Sxi) 
forwards through the network (similar in form to the Backpropagation algo- 
rithm) to arrive at a variance <jy for output y = z p hot- This is done by means of 
the Jacobian matrix J, where 

<"> *-£ 

for a single-output network. The Jacobian matrix thus "provides a measure of the 
local sensitivity of the outputs to changes in each of the input variables" |10[ p. 
247] according to 

(6.2) <^E J *<- 

i 

We therefore present one-sigma confidence intervals a y for z p h ot values esti- 
mated with ANNz, although we note that similar estimations are possible for radial 
basis function networks with the differentiable basis functions we present in Chap- 
ter [7J It should also be borne in mind that these stated uncertainties are only those 
attributable to the presence of photometric noise in the SDSS measurements; they 
do not take into account degeneracies in the colour-redshift relation or difficulties 



inherent in estimating quasar z p hot values at certain redshifts (see ^ 8.2 ) 



6.2.2. Gaussian Weight Distribution and Committees of Networks. 

Before an ANN is trained, its weight and bias vectors are initialised with values 
close to and normally distributed about 0. Since the particular (local) minima 
found by weight optimisation algorithms depends on the randomised initial val- 
ues [9l p. 255], ANNs with slightly different starting parameters will end up with 
slightly different representations of the estimated function. In fact, Way & Sri- 
vastava (2006) point out that "distribution of errors follows a Gaussian" despite 
the nonlinearity of the computed function |43j . So, rather than training several 
ANNs and choosing the network with the best performance on the training or test 
set, multiple trained networks can be combined to form a committee of networks; 
Bishop (1995) demonstrates that these committees will have expected error values 
of at most the expected error of an individual network [9, p. 366]. This result is 
due primarily to the averaging out of the normal variance in z p hot output created 
by the Gaussian distribution of initial parameters. 
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Because of the expected nonlinearity of the functions represented by ANNs, we 
do not average the weight or bias vectors of these conjoined networks, but we take 
the mearj^j of their Zphot output values. In our implementation, each network in a 
committee differs only in its initial weights and biases before training, and not in 
architecture, in choice of activation functions, or in the order/selection of training 
examples from the datasetj^] 

6.3. Implementation of Artificial Neural Networks 

6.3.1. ANNz. We use two implementations of ANNs in our investigations. 
The first and primary implementation is a software package by Collister & Lahav 
(2004) called ANNz |13j . nominally presented as a tool specially designed to apply 
ANNs to the problem of photometric redshift estimation. In fact, ANNz is merely 
a general-purpose ANN tool that implements such advanced techniques as the Ja- 
cobian matrix, quasi-Newton optimisation]^] and committees of networks; nothing 
in its code is specifically designed for photometric redshift estimation. 

ANNz will construct a feed-forward ANN (the specific variety of ANN used is 
known as a multi-layer perceptron, or MLP) of any size, with any number of layers 
and any number of neurons in each layer. The notation we use, following Firth 
et al. (2002) |15j . specifies the number of neurons in each successive layer; e.g., 
4:6:1 specifies a two-layer network with 4 inputs, a single output, and one hidden 
layer of 6 neurons. This network is then trained on a training set (we have used 
training sets of a random sample of 60% of our data) whilst being validated on 
a validation set (20%). This validation set allows ANNz to avoid overfitting the 
training dataj^Jit fits to the training data for any specified number of iterations but 
selects the network weights and biases that produce the lowest root-mean- square 
(RMS) deviation — between its estimated z p hot and the accurate z spec value — over 
the validation set 



(6.3) CRMS = V ((Zphot - Z spec ) 2 ) 



2 Mcdian can also be used, cf. 1401 and | 15| . 

^Cf. again Vanzella et al. (2004) 1401 who use simple gradient descent. The order of training 

examples makes no difference in our implementation, as we use batch optimisation methods |10l 

p. 240] that act on the entire training set simultaneously. 

4 The limited-memory quasi-Newton technique [9] p. 289] used by ANNz is a method of 

optimising hidden-layer weights that is more robust than the scaled conjugate gradient method 



(see f 6.3.2 I but has significantly higher computational cost. 



5 This follows from Mitchell (1997) BH p. Ill] and Bishop (1995) [9] p. 372]. 
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The validation set thus mimics the test set and allows the ANN to test its param- 
eters against unseen data before being applied to the test set (the remaining 20% 
of the data), fitting to the training data as closely as possible while still preserving 
the generalisation capabilities of the network. 

Additionally, ANNz estimates variances in its z p hot outputs as described in 
6.2.l| 6 and readily applies committees of networks as described in [6.2.2 We 



present results using both of these techniques in j ]6.4| Hidden units in ANNz use 
the logistic sigmoid activation function^] [10 p. 228]: 

( 6 - 4 ) = TT^ 

1 + e a 

whose output values are restricted to the interval (0, 1). On the other hand, units in 
the output layer use a linear activation function (a simple sum of its weighted inputs 
and biases; linear transformations are unnecessary given the training process) so 
as not to restrict the range of possible network outputs. Note that there is no loss 
of generality with the linear output [9, p. 127]; it does not limit our network's 
representational capability as put forward in §6.1.3| 

6.3.2. MLPs in Netlab. Our second implementation of ANNs (MLPs) is in 
Netlab^ a toolbox of MATLAB® methods written by Ian T. Nabney and Christo- 
pher M. Bishop. Netlab provides a slightly more customisablc implementation, 
allowing us to make use of hyperbolic tangent activation functions in the hidden 
layer: 

(6.5) tanh{a) 



which arc equivalent in representational power to logistic sigmoid functions [9j 
p. 127] yet tend to converge faster in training^] Additionally, Netlab provides 
the option of using a scaled conjugate gradient^ algorithm to train the weights 
at O(NW) computational cost, instead of the 0(NW 2 ) cost for a quasi-Newton 
method [9j p. 288], where N is the number of training examples and W is the 
number of adaptive weights. However, MLPs in Netlab cannot have more than 

^ANNz also includes error due to deviation in committee members' estimations; the program 
sums this standard deviation in quadrature with the photometric measurement errors, but we 
have removed this committee error in order to focus on error due to measurement noise. 

^See source code at http://zuserver2.star.ucl.ac.uk/~lahav/annz.src.tar.gz. 

^http: //www. ncrg. aston. ac .uk/netlab/ 

9 Cf. also [30] p. 185]. 

"^See [9] p. 282] for a description of this technique. 
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one hidden layer and do not include the readily calculated output variance figures 
according to the Jacobian matrix, as in ANNz. Still, the training of MLPs in 
Netlab gives us a more direct point of comparison to radial basis function networks 
(Chapter [7j also implemented in Netlab) with respect to accuracy, efficiency, and 
convergence. 

6.4. Results and Analysis 

For ANNz, we use network architectures of x:10:l, x:20:l, x:40:l, and x:100:l 
for each dataset, and we form committees of five networks in each case, trained up 
to 1,000 iterations or until network convergence (whichever is first). We present 
committee z v \ lot RMS errors and the average (also RMS) error attributable to pho- 
tometric noise in the measurements, as well as the percentage of redshifts success- 
fully predicted within Az = [0.1, 0.2, 0.3], where Az = \z p hot — z sp ec\, for consistency 
with the literature. For MLPs in Netlab, we use the same network structures as 
with ANNz, but instead present some individual network RMS errors: the mini- 
mum, maximum, and mean RMS for networks in each committee, as well as the 
crms obtained by the committee as a whole, after 500 and 1,000 training itera- 
tions. The maximum number of iterations (1,000) was selected after observing the 
convergence behaviour of x:40:l and x: 100:1 networks training over 3,000 iterations 
while monitoring the error over the corresponding test set. Errors were found to 
be minimal between 800 and 1,500 generations; training to a maximum of 1,000 
iterations should allow us to avoid overfitting most of the training sets. Finally, we 
look at plots of z p hot vs. z spec and discuss the source of errors in z p /j 0t estimates. 



6.4.1. Results with ANNz. Tables [671] , [O] |Q| and [6T4] present the error 
values achieved with ANNz for architectures with 10, 20, 40, and 100 hidden units, 
respectively. 

Although RMS errors generally shrink with larger network sizes, the improve- 
ment is clearly bounded (using a larger hidden layer does not always decrease error) , 
and in some cases (e.g., ugriz§\§i and ugrizJHK for 100 hidden units) error ac- 
tually increases with the larger network. Since ANNz does not merely take the 
final set of weights and biases in training, but instead uses the weights and biases 
that minimise error over the validation set, we do not see memorisation effects for 
small datasets in larger networks. Instead, the increase in error for ugriz 61 g 2 and 
ugrizJHK is due to the larger networks' inability to train their weights and bi- 
ases sufficiently over the smaller (order 6,192) datasets. Additionally, in previous 
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Dataset 


^ KMo 


RMS„„ i= „ 

±u.vxu noi g e 


Az < 0.1 


< 0.2 


< 0.3 


ugriz 


0.4241 


0.1752 


0.3193 


0.5444 


0.6587 




0.4256 


0.1731 


0.3313 


0.5476 


0.6603 


ugriz , Cugi C*gri ^rii Ci z 


0.4148 


0.1511 


0.3372 


0.5607 


0.6729 


ugriz 6 i92 


0.4940 


0.2204 


0.2410 


0.4140 


0.5460 


ugrizJHK 


0.3718 


0.1695 


0.3740 


0.6120 


0.7520 


•-^ug) ^gri Wij <-^«) ^zJi ^JH: ^HK 


0.3857 


0.2793 


0.3550 


0.5960 


0.7190 



Table 6.1. ANN 2 results with 5-committees of x: 10:1 networks. 
RMS n oiso is the estimated error attributable to photometric mea- 



surement noise, as discussed in i 6.2.1 



Dataset 


CRMS 


RMS no i se 


Az < 0.1 


< 0.2 


< 0.3 


ugriz 


0.4064 


0.1819 


0.3766 


0.5906 


0.6957 


Cug j Cg r i C r i , C%z 


0.4109 


0.1763 


0.3709 


0.5885 


0.6943 


Ugriz , Cugi Cgn Crii Ciz 


0.3958 


0.1470 


0.3753 


0.5940 


0.7075 


ugriz 6lg2 


0.4793 


0.2049 


0.3140 


0.4870 


0.6140 


ugrizJHK 


0.3949 


0.1506 


0.2690 


0.5420 


0.6920 


^ugi ^gri ^ri> ^izi ^zJi ^JHi ^HK 


0.3621 


0.2731 


0.4390 


0.6530 


0.7630 



Table 6.2. ANNz results with 5-committees of x:20:l networks. 



Dataset 


CRMS 


RMS no i S e 


Az < 0.1 


< 0.2 


< 0.3 


ugriz 


0.3970 


0.1773 


0.3963 


0.6100 


0.7123 


(-*ug > C gr ) C r i , Ciz 


0.4053 


0.2135 


0.4019 


0.6148 


0.7089 


ugriz, C ug , C gr , C r i, Ci Z 


0.3877 


0.1611 


0.4091 


0.6265 


0.7244 


ugriz 6lg2 


0.5048 


0.2281 


0.3160 


0.5080 


0.6330 


ugrizJHK 


0.3730 


0.1694 


0.4380 


0.6480 


0.7500 


/ • s i / ' s i / ( i — 1 S 1 

*sug, ^gr, ^ri, ^iz, ^>zJi ^JHi ^HK 


0.3596 


0.2791 


0.4350 


0.6630 


0.7570 



Table 6.3. ANNz results with 5-committees of x:40:l networks. 



Note the marked increase in ctrms for the ugriz^\ g2 set. 



(unlisted) runs we obtained consistent RMS errors of over 0.8 for the ugrize\ g2 
and ugrizJHK datasets because of an unlucky random distribution of data into of 
training and validation sets. This is not to suggest that a dataset with poor dis- 
tributions is necessarily intrinsically unlcarnablc (Nctlab MLPs and RBFNs could 
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Dataset 


u KMo 


RMS .„ 


Az < 0.1 


< 0.2 


< 0.3 


ugriz 


0.3960 


0.1853 


0.3983 


0.6145 


0.7129 


@ug j C(?r i C r [ , Ciz 


0.4067 


0.2022 


0.4174 


0.6228 


0.7152 


ugriz , Cug: Cgr: ^rii ^iz 


0.3854 


0.1599 


0.4199 


0.6380 


0.7301 


ugriz 6 i92 


0.5255 


0.1162 


0.1880 


0.3680 


0.4980 


ugrizJHK 


0.4081 


0.1458 


0.3610 


0.5690 


0.6990 


^ugi ^gr, ^rii ^izi ^zJi ^JHi ^HK 


0.3557 


0.2905 


0.4500 


0.6480 


0.7620 



Table 6.4. ANNz results with 5-committees of x:100:l networks. 



Note the increasing ctrms for ugriz§\§i and ugrizJHK. 

learn on the ugrizJHK dataset that ANNz could not learn), but dataset size, dis- 
tribution into training and test sets, and choice of weight optimisation algorithm 
all contribute to the convergence behaviour of a learning network. 

Also apparent from Tables |6.1f|6.4| is that the breaking down of photometric 
magnitudes into colours (ugriz into [C ug , C gr , C r i, Ci Z ] and ugrizJHK into [C ug , 
C gr , C r i, Ci Z , C z j, Cjh, Chk]) tends to improve orms for ugrizJHK and but 
worsens ctrms for ugriz. It is unclear why this is, but it may be a balancing between 
the information contained in plain magnitudes and the information made more 
explicit in their colours: the amount of information made explicit in [C ug , C gr , C r i, 
Ci Z , C z j, Cjh, Chk] is more than made explicit in [C ug , C gr , C r j, CjJ, possibly 
compensating for the loss of some information in one of the ugriz magnitudes. 
That there is information to be drawn from both the magnitudes and their colours 
is suggested by the fact that [ugriz, C ug , C gr , C r i, Ci Z ) test sets are significantly 
better predicted than either ugriz or [C ug , C gr , C r i, Ci Z ] sets. 

6.4.2. Results with MLPs in Netlab. Since MLPs in Netlab are function- 
ally similar to those trained in ANNz, we consider different results in this section: 
we use committees of five networks of the same architectures as in §6.4.1| but 
we present minimum/maximum/mean/committee crms results. Networks were 
trained up to 500 and 1,000 iterations. This presentation will better illustrate the 
improvement expected when using committees of networks. 

It is clear from Tables [5~5}j6.8| that, as suggested in |6.2.2[ MLP committees 
can be relied upon to produce RMS errors on average no worse than the mean RMS 
errors of committee members (i.e., the expected ctrms for an individual network). 
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Hat r\ sot 


RMS • 


RMS 




RMS 


iter 


tinriz 

Lit Lj I V As 


5883 


6083 


0.5972 


0.5931 


500 


Lit Lj I V £s 


0.5751 


0.5994 


5862 


0.5812 


1000 


c c c c 

^ugt ^grj 


0.4537 


0.4607 


0.4561 


0.4459 


500 


C 1 C 1 C 1 C 1 

^ugi ^ gr j v - y ri; 


0.4506 


0.4526 


0.4519 


0.4420 


1000 


HOT17 (In n CI™ CI^a CjZ~ 
Liy i v/j , ^ugj ^ gr •) ^rii ^iz 


0.5717 


0.5972 


0.5851 


0.5793 


500 


UCfViZj Cugj Cgr-, C r i, Ci z 


0.5637 


0.5759 


0.5715 


0.5634 


1000 


ugriz 6 i 92 


0.8169 


0.8178 


0.8175 


0.8174 


500 


ugriz 6192 


0.8170 


0.8176 


0.8173 


0.8171 


1000 


ugrizJHK 


0.4870 


0.4975 


0.4928 


0.4913 


500 


ugrizJHK 


0.4722 


0.4952 


0.4834 


0.4783 


1000 


C U g , C gr , C r i , Ci z ,C Z J, CjH , ChK 


0.4116 


0.4374 


0.4197 


0.4048 


500 


C U g , C gr , C r i > Ci z ,C Z J, CjH , ChK 


0.4078 


0.4431 


0.4184 


0.4004 


1000 



Table 6.5. MLP/Netlab results with 5-committees of x:10:l net- 
works. Committee results better than the mean are italicised, those 
better than the best individual result (RMS m ; n ) are bolded. 



Dataset 


RMS min 


RMS max 


MRMS 


RMScom 


iter 


ugriz 


0.5775 


0.5871 


0.5839 


0.5744 


500 


ugriz 


0.5488 


0.5808 


0.5628 


0.5469 


1000 


Cug •> Cg r , C r i ; Ci z 


0.4456 


0.4522 


0.4482 


0.4435 


500 


n n n . n 

^ugi ^gn ^rn ^iz 


0.4374 


0.4419 


0.4403 


0.4335 


1000 


ugviZj C U g 7 Cgr-, Cri-, Ci z 


0.5445 


0.5864 


0.5733 


0.5630 


500 


ugviZj C U g : Cgr-, Cri-, Ci z 


0.5277 


0.5741 


0.5572 


0.5429 


1000 


ugriz 6192 


0.8165 


0.8168 


0.8167 


0.8166 


500 


ugriz 6192 


0.8162 


0.8167 


0.8165 


0.8164 


1000 


ugrizJHK 


0.4870 


0.4936 


0.4901 


0.4869 


500 


ugrizJHK 


0.4688 


0.4933 


0.4799 


0.4742 


1000 


C U g, Cgr , Cri , Ci z ,C Z J, CjH , ChK 


0.4057 


0.4209 


0.4130 


0.3963 


500 


C U gi Cgr , C r i 7 Ciz 7 C z J , CjH , ChK 


0.3854 


0.4055 


0.3978 


0.3789 


1000 



Table 6.6. MLP/Netlab results with 5-committees of x:20:l networks. 
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T) fit a sot 


RMS • 


RMS 


/ / "F> "TV T O 

P'HJVlb 


RMS 


iter 


11 or 7 7 

Lit Lj I V As 


0.5700 


6023 


5855 

V ' • ". 7 L 7 ■ 7 ■ 7 


0.5783 


500 


11 nr i 2 

Lit Lj I V £s 


5399 


0.5798 


5635 

V ' • " 1 \ / • 7 ■ 7 


5538 

Li - .7 .7 ' 7 LJ 


1000 


c c c c 

^ugt ^grj 


0.4409 


0.4491 


0.4448 


0.4402 


500 


C 1 C 1 C 1 C 1 

^ugi ^ gr j v - y ri; 


0.4366 


0.4393 


0.4381 


0.4323 


1000 


7lOri7 (In n CI™ CI^a CjZ~ 
Liy 1 v/j , ^ugj ^ gr •) ^rii ^tz 


0.5412 


5693 


5598 


0.5527 


500 


UCfViZj Cugj Cgr-> &ri-> Ciz 


0.5131 


0.5486 


0.5316 


0.5194 


1000 


ugriz 6 i 92 


0.8165 


0.8172 


0.8169 


0.8167 


500 


ugriz 6192 


0.8165 


0.8174 


0.8170 


0.8168 


1000 


ugrizJHK 


0.4819 


0.4990 


0.4919 


0.4871 


500 


ugrizJHK 


0.4725 


0.4943 


0.4830 


0.4760 


1000 


C U g , C gr , C r i , Ci z ,C Z J, CjH , ChK 


0.3997 


0.4335 


0.4105 


0.3916 


500 


C U g , C gr , C r i > Ci z ,C Z J, CjH , ChK 


0.3945 


0.4181 


0.4054 


0.3808 


1000 



Table 6.7. MLP/Netlab results with 5-committees of x:40:l networks. 



Dataset 


RMS min 


RMS max 


Mrms 


RMS com 


iter 


ugriz 


0.5608 


0.5873 


0.5796 


0.5750 


500 


ugriz 


0.5482 


0.5589 


0.5525 


0.5477 


1000 


Cug ) Cgr , C r i ; Ci z 


0.4431 


0.4514 


0.4467 


0.4415 


500 


Cug ) Cgr , C r i ; Ci z 


0.4345 


0.4421 


0.4381 


0.4327 


1000 


ligviz, C U g, Cg r: C r i, Ci z 


0.5481 


0.5909 


0.5678 


0.5634 


500 


ligviz, C U g, Cg r: C r i j Ci z 


0.5215 


0.5438 


0.5326 


0.5277 


1000 


ugriz 6192 


0.8166 


0.8169 


0.8168 


0.8167 


500 


ugriz 6192 


0.8167 


0.8170 


0.8168 


0.8168 


1000 


ugrizJHK 


0.4779 


0.4942 


0.4864 


0.4822 


500 


ugrizJHK 


0.4682 


0.4751 


0.4717 


0.4680 


1000 


C U g , Cgr , Cri , Ci z ,C Z J, CjH , ChK 


0.4094 


0.4438 


0.4201 


0.4055 


500 


C U g , Cgr , C r i > Ci z ,C Z J, CjH , ChK 


0.3905 


0.4521 


0.4122 


0.3878 


1000 



Table 6.8. MLP/Netlab results with 5-committees of x: 100:1 net- 
works. Note the slight increase in RMS com over all the datasets 
including colours C. 
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In a majority of cases, the committees produced results better even than the best 
performing individual network. 

The RMS errors over some datasets are seen to increase between the x:40:l 
and the x:100:l networks for both 500 and 1,000 iterations. However, if this slight 
increase in error were attributable to overfitting (memorisation of the training data 
with the larger hidden layer) , errors over these datasets would probably also increase 
between their 500 th and 1,000 th iterations. Since none of this error increase is 
observed, we can conclude that no major overfitting has likely taken place up to 
1,000 training iterations. 

We have intentionally left in a ugrzz 61 g 2 dataset that has failed to train prop- 
erly (owing to the same unlucky distribution into training and test sets described 



in [6.4.1). This particular distribution frustrated learning over all MLP network 
architectures, and will do likewise with RBFN architectures in |7.4| The training 
and test sets could be reassigned, however, as was done to achieve better results 



in [6.4.1 The significant improvement yielded by adding the JHK magnitudes to 
the ugrizeig2 dataset is best illustrated in Tables |6.1||6.4| 

We note also the superiority of nearly all the ANNz results to those of MLPs in 
Netlab. As the training algorithms for ANNz and MLP/Netlab are nearly identical 
(except for their quasi-Newton vs. scaled conjugate gradient methods for weight 
optimisation and ANNz's minimisation of error over the validation set), and as we 
do not observe any signs of significant overfitting, we might conclude that ANNz's 
use of a validation set has allowed it to generalise much more efficaciously than 
MLP/Netlab. This is to say that, by training over a known training set yet mea- 
suring error over an unseen validation set, ANNz has more closely approximated 
the 'true' functional relationship between photometric measurements/colours and 
(spectroscopic) redshift. 



6.4.3. Analysis. Figure 6.2 shows a typical z p hot-z spe c plot, with systematic 
deviations that are also areas of 'catastrophic failure' (having very high Az). To 
better understand the source of these deviations, we look at Figure |6.3| which 



demonstrates that a y (deviation due to photometric noise; see [6.2.1 1 is not highly 
correlated with z spec ; accordingly, the majority of the error relative to z spec is due 
to the inability of our ANNs to learn the colour-redshift relation. This Az error 



relative to z spec is plotted in Figure 6.4 which indicates that error is much higher 
at specific redshifts; the astrophysical explanation for this finding is put forward in 
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Figure 6.2. z phot vs. z spec for ugriz, 5:10:1, ANNz. ctrms is 0.4241. 

3.5 1 1 1 1 1 1 

3 - 

2.5 - * - 

2- t 

1.5- ; * 

1 - * , : 




Figure 6.3. Photometric deviation a y vs. z spec for ugriz, 5:10:1, 
ANN 2;. RMS noisc is 0.1752. 
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FIGURE 6.5. z p hot vs. z spec for ugriz^xw, 5:10:1, ANNz. ctrms is 0.4940. 



r>(> 
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0.5 1 



Figure 6.6. z. 



phot 



for ugrizJHK, 8:10:1, ANNz. ctrms is 0.3718. 



Figures 6.5 and 6.6 present Zp^ot-Zspec plots for ugriz§\<$i and ugrizJHK, for 
comparison to Figure [672] The reduction of dataset size from 46,420 to 6,192 over 
the ugriz filters creates a noticeable dispersion in the z p hot predictions in Figure 
6.5. (Notice that the familiar systematic deviations are still manifest in the smaller 
ugrizQig2 dataset.) However, once JHK filters are added, ctrms drops noticeably 



(Figure 6.6), and even some of the catastrophic failure is tempered. 



Figure 6.7 demonstrates the failed learning of the ugriz§\§2 dataset that was 
poorly distributed into training and test sets. The majority of ugriz magnitudes 
that were matched to available JHK magnitudes had corresponding z spec values in 
the interval (0,2). As this was a highly problematic interval in the colour-redshift 
relation, it is unsurprising that our MLP failed to learn the data properly; what 
is surprising is the degree of its failure. The ctrms between z p hot and z spec over 
ugriz 6 ig2 for a network that output only values of z p hot = 1 would be 0.8162; the 
network in Figure [6~7| achieved ctrms — 0.8168. An investigation of the degeneracy 



in the colour-redshift relation in the interval (0,2) is carried out in {8.2 
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FIGURE 6.7. z phot vs. z spec for ugriz 6 ig 2 , 5:100:1, MLP/Netlab, 
after 1,000 iterations. The network failed to learn the given dataset 
over training, and so suggests z p hot values closely distributed about 
unity, ctrms is 0.8168. 



CHAPTER 7 



Radial Basis Function Networks for Photo- £ 

Estimation 

Similar in structure and representational ability [9l p. 168] to ANNs, radial 
basis function networks (RBFNs) have some important advantages over ANNs, yet 
are not as widely considered in the astrophysical literature^] 

The most obvious feature that RBFNs share with ANNs is their network struc- 
ture: an RBFN is basically identical in form to a two-layer MLP except for its 



differing activation functions in the hidden layer (see { 7.1 1. An RBFN with N hid- 
den units is able to produce a smooth function that can precisely fit N data points 
in a training set and can interpolate between the points. However, in our applica- 
tion, we want to approximate the underlying function generating our training data 
and account for noisy measurements j9j p. 167]; we will sacrifice some accuracy in 
fitting training data in order to maintain the generalisation ability of our network. 
Accordingly, we will use far fewer hidden units than training data points, as in an 
ANN. 

7.1. Radial Basis Functions 

The heart of an RBFN is, predictably, the radial basis function (RBF), used 
as the activation function in the hidden layer of an RBFN. For an RBFN with M 
inputs and N hidden units, an RBF is a function </>„(•) centred at an M-dimcnsional 
basis vector \x n that varies only with the Euclidean distance (in M-space) between 
the input vector x and the basis vector fj, n |101 p. 299]. Thus, the output of an 
individual hidden-layer neuron in an RBFN is in the forrrj^] 0„(x) = </>(||x — fi n \\) 
where </>(•) is one of the kernel functions defined below. 

The kernel functions (weighted about a fixed centre) <f> that we use with our 
basis functions (forming bases of which network outputs are linear combinations) 



^However, cf. 1421 and 1431 . where they are referred to as 'kernel methods'. 

2 The notation is our own, varying slightly from those of [9], [10], [33], [18], [43], and [42]: 

we mean to convey more clearly that the RBF </» n (x) is really a parameterised activation function 

of the two variables x and p n (the fixed parameter), while <j>(x) is a general kernel function. 
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<j> n are the Gaussiai 



(7.1) 



4>{x) = e-^l^) 



and the thin-plate spline functio: 



(7.2) 



4>{x) 



x 2 \n(x). 



Note that hyperspherical (radially symmetric) Gaussians are used for simplicity; 
however, these Gaussians can be generalised to better fit the training data with 
hyperelliptical basis functions 9 , p. 35]. Hyperclliptical basis functions are partic- 
ularly useful in the presence of irrelevant input variables |9j p. 184] to overcome 
the curse of dimensionality 



Bishop (1995) notes, "there are many potential applications for neural networks 
where unlabelled input data [without targets] is plentiful, but where labelled data is 
in short supply" [9j p. 183]. This is precisely the case with the redshift estimation 
problem: photometric ugriz data are readily available and easily acquired, but 
proper z spec measurements are few and will continue to lag behind by orders of 
magnitude [6]. While there are many ways to train an RBFN, the following two- 
stage training procedure is well suited to dealing with the stated problem. The 
unlabelled ugriz data can be used for unsupervised learning as described below, 
while the smaller amount of z spec -labelled data can be used in the second stage of 
training. Contrast this to the training procedure of an ANN, then, which will only 
be able to utilise the smaller, labelled subset of data. 

7.2.1. Unsupervised Parameter Optimisation. In an RBFN, basis func- 
tion parameters and second-layer weights could be optimised like those of an ANN 
with an iterative, supervised training algorithm; however, implementations (like 
ours) that sacrifice optimal accuracy in favour of computational efficiency will make 
use of unsupervised learning techniques for part of the RBFN training process. 

The two layers in our RBFNs are trained separately; as our networks have fewer 
basis functions than training data points, the centres fi n and widths a n of the basis 
functions (j) n in the first layer are determined in advance with an algorithm that 
is blind to the target data of the training set. This means that such an algorithm 

^For the Gaussian, the width parameter a n is fixed in advance along with the centre fj, n . 
^The spline function <j>(x) = x 4 ln(x) was also tested, but it produced results less accurate 

than those of the thin-plate spline function in all tests. 
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needs only the training data inputs to fix the first-layer parameters; centres and 
widths are calculated to represent the estimated mixture density p. 59] of the 
input vectors. 

7.2.2. Linear Determination of Weights and Biases. The second-layer 
weights in our RBFNs are then calculated by a supervised method, but they are not 
iteratively optimised as in an ANN. Instead, the output values of the training data 
are used, along with the newly fixed centres and widths of the basis functions, to 
directly compute the second-layer weights and biases with a set of linear equations 
[9l p. 171]. As this is a linear problem, it is considerably less computationally 
intensive than the non-linear optimisation problems in ANN training, for example. 

7.3. Implementation of RBFNs in Netlab 

Our implementation of RBFNs is in Netlab, the MATLAB® toolbox. As men- 
tioned in |7.1[ we use a hypcrspherical Gaussian and the thin-plate spline function 
as our hidden-layer activation functions. Our output is linear, and training is done 
in two stages as described in §7.2| 

The first training stage uses the 'expectation-maximisation' (EM) algorithm to 
centre the basis functions. Generally speaking, the EM algorithm estimates the N 
basis function centres as though the input data were composed of N hyperspherical 
Gaussian distributions mixed amongst each other in the input space [101 pp. 435ff.]. 
The widths of the Gaussians are then set to some appropriate value; we use the 
square of the maximum inter-centre Euclidean distance. In the second training 
stage, the second-layer weights and biases are quickly calculated from the first- 
layer parameters by minimising sum-of-squared error over the training examples, 
following a simple matrix algebraic result [9j p. 92] . 

It happens that the EM algorithm is initialised with randomised, normally 
distributed parameters, and so its outputs, the basis function parameters, are 
thereby subject to the same normal variance as ANNs in Chapter [6] |6.2.2| Addi- 
tionally, the linear calculation of second-layer weights and biases is deterministic, 
so the weights — and therefore the z p hot outputs — of the RBFN are normally dis- 
tributed along with the first-layer parameters. Accordingly, we can use committees 
of RBFNs in the same way that we use committees of ANNs, taking the mean of 
the committee members' outputs as our final z p hot value. 
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Figure 7.1. z phot vs. z phot for two individual 5:100:1 TPS RBFNs 
over ugriz. 

7.4. Results and Analysis 



As in E 6.4.2 we present niininium/niaxiniuni/mean/coniniittee RMS results. 
All RBFNs were trained with 100 iterations of the EM algorithm for centring the 
basis functions. Again, as in §6.4.2| our committees regularly achieve lower RMS 
errors than the expected individual network RMS, seen in Tables |7.1f|7.8| Figure 



7.1 illustrates the expected Gaussian deviation between committee members, as 
discussed in §6.2.2| 

The difference in prediction accuracy between RBFNs with Gaussian activa- 
tions and those with thin-plate-spline (TPS) is often considerable. Although neither 
Gaussian nor TPS activations are demonstrably superior for any given network ar- 
chitecture, some trends are observed when considering individual datasets. 

Looking at committee ctrms results, the RBFNs with Gaussian activations 
achieve consistently lower errors than those with TPS activations over the ugriz 
and ugrizJHK datasets. The opposite is true for the [C ug , C gr , C r i, Ci Z ] dataset, 
and comparable results are achieved with [ugriz, C ug , C' gr , C r i, C LZ \. Curiously, 
over the [C ug , C gr , C r i, Ci Z , C' z j, Cjh, Chk] dataset, Gaussians yield significantly 
higher accuracies in 7:10:1 and 7:20:1 networks, but TPS performs substantially 
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Dataset 


RMS min 




A' 1 RMS 




ugriz 


0.5717 


0.5790 


0.5750 


0.5720 


Cugi Cg T , C r i, Ci z 


0.5526 


0.5562 


0.5548 


0.5533 


UQTIZ, C U g, Cg r , C r i, Ci z 


0.5709 


0.5776 


0.5757 


0.5731 


ugriz 6192 


0.8165 


0.8169 


0.8167 


0.8166 


ugrizJHK 


0.5081 


0.5116 


0.5099 


0.5085 


^ug, ^gr, ^rij ^iz, ^->zj > ^ J H > ^HK 


0.5121 


0.5221 


0.5169 


0.5142 



Table 7.1. RBFN results with 5-committees of x:10:l networks, 
using the Gaussian activation function with 100 iterations of the 
EM algorithm for basis centring. Committee results better than 
the mean are italicised, those better than the best individual 
result (RMS m ; n ) are bolded. 



Dataset 


RMS min 


RMS max 


URMS 


RMS com 


ugriz 


0.6011 


0.6213 


0.6120 


0.6077 


Cugj Cg r , C r i, Ci z 


0.4814 


0.4973 


0.4860 


0.4809 


ugriz , C U g, Cg r , C r i, Ci Z 


0.5884 


0.5942 


0.5918 


0.5896 


ugriz 6W2 


0.8164 


0.8175 


0.8168 


0.8167 


ugrizJHK 


0.5217 


0.5356 


0.5285 


0.5256 


C-ug, C gr , C r i, Ci z , C z j, Cjh, ChK 


0.5745 


0.5932 


0.5857 


0.5740 



Table 7.2. RBFN results with 5-committees of x:10:l networks, 
using the thin-plate-spline activation function. 



Dataset 


RMS min 


RMS max 


URMS 


RMS com 


ugriz 


0.5696 


0.5784 


0.5737 


0.5712 


c c c 

^ugi ^gri ^rn ^iz 


0.5527 


0.5569 


0.5551 


0.5524 


ugriz , C U g, Cg r , C T i, Ci Z 


0.5660 


0.5765 


0.5703 


0.5684 


ugriz 6192 


0.8164 


0.8170 


0.8167 


0.8165 


ugrizJHK 


0.5080 


0.5092 


0.5086 


0.5079 


C ug , Cg r , C r i, C\ z , C z j, Cjh, Chk 


0.5046 


0.5234 


0.5134 


0.5091 



Table 7.3. Gaussian RBFN results with 5-committees of x:20:l networks. 
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Dataset 


RMS min 

*** 1111 I 1 


XvJVlSiTiav 

*** llldX 


/^RMS 


riJVLSrrirri 

UU111 


ugriz 


0.6052 


0.6200 


0.6134 


0.6105 


Cugi Cg ri Crii Ci z 


0.4821 


0.4984 


0.4859 


0.4812 


1igviz : C U g : Cgri Crii 


0.5876 


0.5979 


0.5903 


0.5879 


ugriz 6 i92 


0.8157 


0.8173 


0.8164 


0.8163 


ugriz JHK 


0.5218 


0.5312 


0.5266 


0.5248 


*^ugi ^gr: ^ri, ^iz, ^>zJi ^JH, ^HK 


0.5244 


0.6113 


0.5724 


0.5591 



Table 7.4. TPS RBFN results with 5-committces of x:20:l networks. 



Dataset 


RMS min 


RMS max 


A*RMS 


RMS com 


ugriz 


0.5477 


0.5555 


0.5503 


0.5479 


C U gi Cgri ^rii &iz 


0.5326 


0.5508 


0.5378 


0.5340 


ligriz., Cugi Cg r: Crii Ciz 


0.5367 


0.5534 


0.5458 


0.5427 


ugriz 6192 


0.8176 


0.8223 


0.8196 


0.8193 


ugriz JHK 


0.4811 


0.4886 


0.4854 


0.4832 


^ugi ^gri ^rij ^iz-, '-'zjj ^JH, ^HK 


0.5133 


0.5473 


0.5305 


0.5263 



Table 7.5. Gaussian RBFN results with 5-committees of x:40:l networks. 



Dataset 


RMS min 


RMS max 


^RMS 


RMS com 


ugriz 


0.5734 


0.5834 


0.5797 


0.5758 


C U gi Cg ri Crii Ci z 


0.4568 


0.4649 


0.4605 


0.4561 


ugriz , Cugi Cg ri C r i, Ci Z 


0.5580 


0.5652 


0.5621 


0.5569 


ugriz 6192 


0.8177 


0.8198 


0.8187 


0.8183 


ugrizJHK 


0.5017 


0.5114 


0.5057 


0.5015 


*^ug, ^gr: ^rij ^iz, ^zj 7 ^JH, ^HK 


0.4778 


0.4946 


0.4890 


0.4825 



Table 7.6. TPS RBFN results with 5-committees of x:40:l networks. 



better in 7:40:1 and 7:100:1 networks. In fact, the Gaussian RBFNs in the 7:100:1 
networks produce outliers of sufficient magnitude to raise /iRMS to 0.9940, and 



one network failed to learn the data at all (Figure 7.2 1. The 5-committee for these 
networks, however, is seen to reduce RMS com to 0.8141. (A 5-committee calculating 
the median instead of the mean over these networks' outputs yielded an RMS com of 
0.8263; some outliers — even z p hot ~ 30 — were consistently produced by committee 



members, as seen in Figure 7.3 ) 
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Dataset 


RMS min 

*** 1111 I 1 


XvJVlSiTia-v- 

*** llldX 


/^RMS 


rvJVLSrrirn 

UU111 


ugriz 


0.5220 


0.5441 


0.5295 


0.5216 


Cugi Cg ri Crii Ci z 


0.5232 


0.5650 


0.5400 


0.5247 


1igviz : C U g : Cgri ^rii 


0.5138 


0.5302 


0.5208 


0.5186 


ugriz 6 i92 


0.8278 


0.8539 


0.8372 


0.8359 


ugriz JHK 


0.4551 


0.4674 


0.4624 


0.4555 


^ugi ^gr: ^ri, ^iz, ^>zJi ^JH, ^HK 


0.9373 


1.0595 


0.9940 


0.8141 



Table 7.7. Gaussian RBFN results with 5-committccs of x:100:l networks. 



Dataset 


RMS min 


RMS max 


A*RMS 


RMS com 


ugriz 


0.5226 


0.5317 


0.5290 


0.5232 


Cugi Cgri C r i^ C%z 


0.4366 


0.4385 


0.4379 


0.4352 


Ugriz., Cugi Cgri 


0.4841 


0.4959 


0.4878 


0.4834 


ugriz 6192 


0.8256 


0.8276 


0.8261 


0.8441 


ugriz JHK 


0.4640 


0.4713 


0.4669 


0.4605 


/ ■ r ■ / ' r ■ / ■ / < r 1 

Wg, ^gri ^rij ^izi *~>zJl ° J H ) ^HK 


0.4040 


0.4181 


0.4096 


0.4014 



Table 7.8. TPS RBFN results with 5-committees of x:100:l networks. 



All 80 RBFNs produced failed to learn the ugrizeig2 dataset with its orig- 
inal training/test set distribution, which was left as- is. The TPS RBFNs with 



100 hidden units (see Figure 7.4 1 showed the most deviation from outputting only 
Zphot — 1, and significantly more deviation than MLPs showed in Figure 6.7 



Finally Figures |775| and [7T6| illustrate two RBFNs that, while not as accurate as 
the ANNs of Figures |6.2| and |6.6| are highly similar in form. This indicates that they 
suffer from the similar difficulties in learning and generalising on the ugrizJHK 
data; it also recalls their rough equivalence in representational capability. 

The general symmetry for mild outliers at Az ~ [1,2] about the line z p h ot — 
z spec in Figures 7.6 and |6.6| brings more clearly to light the degeneracy suggested 



by Richards et al. (2001b) [32 and Weinstein et al. (2004) |44j : certain quasars 
are close matches for two or more redshifts, and the z p f- lot -z spec relation illustrates 



this confusion by plotting them symmetrically about z p h ot — z spec . [8.2 discusses 
this degeneracy in detail. 
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Figure 7.2. z phot vs. z spec for a 7:100:1 Gaussian RBFN over 
[Cug, C gr , C ri , C lz , C zJ , Cjh, Chk}- "rms is only 1.0595— not 
dissimilar to other networks in the same committee that converged 
but had large outliers. 



66 7. RADIAL BASIS FUNCTION NETWORKS FOR PHOTO-z ESTIMATION 




Figure 7.3. z p hot vs - z phot for two individual 7:100:1 Gaussian 
RBFNs over [C ug , C gr , C ri , C iz , C zJ , C JH , C H k]- Note the large 
outliers consistently predicted by both RBFNs. 
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Figure 7.4. z^oi vs. z spec for the 5-committee of 5:100:1 TPS 
RBFNs over ugriz^i. ctrms is 0.8441. 




Figure 7.5. z v h ot vs. z spec for the 5-committec of 5:100:1 Gauss- 
ian RBFNs over ugriz. orms is 0.5216. 
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Figure 7.6. z phot vs. z spec for the 5-committee of 7:100:1 TPS 
RBFNs over [C ug , C gr , C rl , C iz , C zJl C JH , C HK ]. ctrms is 0.4014, 
and 69.5% of objects are predicted to Az < 0.3. 



CHAPTER 8 



Discussion 



8.1. Systematic Errors 

In presenting a survey of network-based learning methods for photometric red- 
shift estimation, we have done very little to minimise RMS errors by way of param- 
eter optimisation. Still, we have achieved ctrms values of 0.3557 and 0.4014 with 
ANNs and RBFNs, yielding Az values within 0.3 for 76.2% and 69.5% of quasars, 
respectively. These results are comparable to recent standard results in the litera- 
ture ( |27j . [44], [4], [46]), but they do not compare to the latest results of Ball et 
al. (to appear), who are able to select a subset of quasars whose redshifts may be 
estimated with significantly greater accuracy (improving crms from 0.343 to 0.117, 
and the percentage of quasars within Az < 0.3 from 79.8% to 99.3%.) 6 Q 

All published results suffer from the same systematic errors at certain values of 
z spec , however. In fact, with hidden layers of up to 100 units, we were unable even to 
memorise our training sets after 3,000 training iterations: testing over our training 
sets yielded the same deviations and regions of catastrophic failure in our z p h ot -z svec 
relation. This indicates some degeneracy inherent to the training sets; i.e., some 
sets of photometric magnitudes and colours correspond to multiple spectroscopic 
redshifts. 

8.2. Colour-Redshift Relation and Spectral Lines 



As discussed in [5.5 the colour-redshift relation (CZR) is of primary impor- 
tance in estimating redshift from photometric measurements. Figure [8T| [32] plots 
the CZR for the colours C ug , C gr , C r i, and Ci Z . Richards et al. (2001a) describe 
in detail the features of the CZR, and find that almost all of the structure can be 
attributed to emission lines moving in and out of the ugriz filter ranges at different 
redshifts |31j . This is consistent with the observations of Ball et al. (to appear) 



^Ball et al. (to appear) use statistical techniques that amount to excluding quasars that are 
near CZR degeneracies in colour-colour space. They exclude 61.1% of their dataset, but, as this 
is not even an order-of-magnitude loss and they have managed to remove nearly all catastrophics, 
their result is exceptional. 
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Figure 8.1. Colour-redshift plots of SDSS quasars indicating me- 
dian colour as a function of redshift (solid curve) and one-sigma er- 
rors (dashed curves). The relation is degenerate in colour-redshift 
space if it is not one-to-one. From Richards et al. (2001b) 
adapted from 



who demonstrate that major deviations in the z p i lot -z spec relation fall at redshifts 
where important emission lines cross SDSS hlter boundaries (Figure 8.2) [6]rj 



Further, as illustrated in Figures |8.1| and |8.3| there is significant degeneracy 
in the CZR, indicating that it is impossible to predict redshift for all quasars as a 
function of four colours alone. Note that higher-quality photometry will not solve 
the problem: while some level of photometric certainty is needed to approach the 
CZR, recall from |6.2.1| that we were able to deviate our inputs to an ANN by 
the estimated photometric noise to find the resulting change a y in z p hot- As seen 



Cf. also [4] for redshift values at which important emission lines cross ugriz transitions. 
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Figure 8.2. z v h ot -z svec plots indicating where the five brightest 
emission lines cross ugriz filters. From Ball et al. (to appear) [6]. 



in Figure |6.3| photometric noise is not responsible for the majority of the error 
Az. Therefore, as Way & Srivastava (2006) suggest for empirical reasons [43j . 
photometry improvements alone will not overcome the CZR degeneracy. 

In order to break the degeneracy, we need additional input parameters. A 
demonstration of the improvement in accuracy were the degeneracy to be broken 
is shown in Figure |8~4] If we had some way of segregating z < 1 quasars, 1 < z < 2 
quasars, and z > 2 quasars from each other (say, by using non-photometric data or 
more specialised filters for detecting specific spectral lines as suggested in [6|^]). a 
neural network training on this additional information might yield results around 
crms — 0.1283, with 97.4% of objects within Az < 0.3. Segregation at z = 2 



showed most of the usual systematic deviations (Figure 8.5 1, confirming that the 
major source of errors is a degeneracy in < z < 2. 



8.3. Other Learning Techniques for Photometric Redshift Estimation 

Other potential improvements to our methods could include the use of cross- 
variance to avoid overfitting a training set, or weighting in committees of networks 
[9j p. 366]. Photometric redshift estimation has also been attempted using support 
vector machines (SVMs) (gj and @2])> GAs EH]> and random forests [12]. We 

■^Cf. also Hatziminaoglou et al. (2000) [19) . who point out the deficiencies of wide-band 
filters for quasars. 
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Figure 8.3. Plots of the CZR in colour-colour space, with one- 
sigma error ellipsoids for certain (labelled) redshifts. The CZR 
track is traced by the curves. From Weinstein et al. (2004) 



tested SVMs on our datasets using the SVMTorch II facility [H]Qbut found higher 
errors in all tests. Still, SVMs are attractive in that they can add extra input 
parameters with near-linear computational overhead 41 and limited dilution of 
accuracy in the presence of irrelevant data with some SVM algorithms |181 p. 385]. 

Computational considerations should also be taken into account: Hastie et al. 
(2001) show that, for N training data, p input variables, W weights, L training 
iterations, M basis functions, and m support vectors, ANNs require O(NpML) 
operations [H p. 367], RBFNs require 0(NM 2 + M 3 ) [18, p. 190], and SVMs 
tend to require 0(m 3 + mN + mpN) [181 p. 405]. 



SVMTorch II is available at http://www.idiap.ch/machine_learning.php? 
content=Torch/en_SVMTorch.txt. We used a Gaussian kernel with a width of 1.0 and er- 
ror pipes of 0.01 and 0.001. 
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Figure 8.4. z phot vs. z spec for three separate 5-committees of 
4:10:1 MLPs in Netlab, trained on datasets segregated at z = 1 
and z = 2 up to 500 iterations, ctrms 1S 0.1283, and 97.4% of 
objects fall within Az < 0.3. 




Figure 8.5. z p \- lot vs. z spec for two 5-committees of 4:10:1 MLPs, 
segregated at z = 2. orms is 0.2857, and 95.8% have Az < 0.3. 



Conclusion 



CHAPTER 9 



Implications of Research and Future Directions 

Part I of this dissertation demonstrated that canonical machine learning tech- 
niques could be adapted to solve a non-trivial astrophysical problem with only 
minor modifications. The same techniques could be used in a reverse manner, to 
approximate the mechanical formulae governing the behaviour of an observed sys- 
tem with known physical parameters. Alternatively, the problem solved could be 
generalised by using more precise, relativistic physical calculations to extend to the 
modelling of binary star systems and the search for other planetary star systems. 

Another interesting extension would be to model simultaneously the orbits of 
several moons with GAs and PSO: in principle, not all moons would be visible 
within a certain window or from a certain point of view (they would occasionally 
be obscured by or pass in front of the major planet), and the moons would not be 
'labelled' individually in each frameQ Still, a GA could begin to model the sys- 
tem by using a variable number of satellites (represented in each organism) equal 
to or greater than the maximum number of moons observed in any one frame, 
with the satellites ordered within the organism by their semi-major axis distance 
a. Crossover between two organisms with differing numbers of satellites could in- 
teract only those satellites whose a values are most similar. Mutation could also 
introduce or remove new satellites to an organism, and fitness could be measured at 
each time step by summing the minimum Euclidean distances between estimated 
moon positions and observed locations, using the location of the major planet as 
a moon location if there are more satellites represented in the organism than ob- 
served in the frame. This would allow the inclusion of frames in which some moons 
are obscured, and would also penalise organisms that represented more satellites 
than were present in the physical system. For PSO, multiple distinct swarms could 
be run simultaneously on the data, with some use of hyperspheres or hypercubes 
(an enforced minimum distance) in parameter space to prevent the swarms from 
converging on the same set of orbital elements. Such an extension would require 



lr This would be similar to using a set of observations like Galileo's (Figure 
cally determine the total number of moons and the orbital elements of each. 
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significantly greater computational resource Qbut this only recalls the growing need 
for more parallelisable and distributed algorithms for common problems in astro- 
physical research. 

Part II demonstrated that trivial applications of machine learning techniques 
can yield results comparable to the most advanced applications of traditional tech- 
niques in photometric redshift estimation. As mentioned in earlier chapters, the 
improvement of these learning techniques — for quasars in particular — will greatly 
assist research in cosmology [32 and the large-scale structure of the universe |47| . 
It would be useful to determine the relationship between photometric redshift ac- 
curacy and quasar magnitude, as the efficient estimation of faint quasar redshifts 
would assist in studying quasar evolution [32 . It would also be interesting to in- 
vestigate in more detail the statistical properties of the z p hot-z spe c relation and the 
CZR degeneracy. The impressive result of Ball et al. (to appear) [6] in selecting 
quasars according to properties of their z p hot probability density functions indi- 
cates that there may well be more statistical learning techniques of use in solving 
the CZR problem. 

Improvements may also be made by calculating the Jacobian matrix for RBFNs 



as in £ 6.2.1 and comparing its z p hot deviations a y with those obtained by ANNz, in- 
vestigating the relationship between computational expense and Az error as regards 
partially unsupervised RBFNs and (fully supervised) ANNs, incorporating X-ray 
and radio flux measurements, and experimenting with different activation functions 
and/or describing their distinct behaviours. However, aside from reducing Az er- 
ror and ensuring predictable confidence intervals, as not all science applications 
require minimal redshift errors [32], serious consideration should also be given to 
the unsupervised learning properties of RBFNs insofar as they may be of particular 
advantage in large sky survey science in the future. 



Indeed, the primary reason we have not tested multiple-satellite modelling is that using a 
vector (a variable-length array) of vectors caused our GA implementation to run remarkably 
slowly, especially considering the expected increase in computational time that modelling multiple 
satellites is likely to require. 
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Source Code: Modelling with GAs and PSO 

A.l. mlastro.cpp 

// mlastro.cpp 

#include "stdafx.h" 
#include <stdlib.h> 
#include <iostream> 
#include <fstream> 
#include <string> 
#define _USE_MATH_DEF INES 
#include <math.h> 
#include <vector> 
#include <time.h> 

using namespace std; 

#include " organism. h" 
#include "particle. h" 

const char* inputXYZ = "X : \\data\\2hAtlasXYZ . txt " ; 
const double EPOCH = 2453006.0; 

//2451545 for J2000.0 (Himalia) , 2450464.25 for 1997 Jan 16.00 TT (Io) , 
//2454600 for test, 2453006 for Jan 1 04 (Atlas) 

const bool LOG = true; //to turn on or off logging of stdev for convergence 

const bool test = false; //if using test set 

const bool taper = false; //use tapered vMAX for PSO 

class observ { 
public : 
double time; 
vector<double> x; 
vector<double> y; 
vector<double> satpang; 
vector<char> visibility; 

}; 

pair<double , double> keplerian(organism o, int sat, double timelnDays) { 
pair<double,double> result; 
result. first=0; 
result . second=0 ; 



double timeElapsed = timelnDays - EPOCH; //in days from EPOCH 
timeElapsed *= 86400; //now in seconds 

double M; //mean anomaly 

double n; //mean motion, in radians/sec: this is an avg, for circular orbits 
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//unable to represent o. a [sat] "3: 

n = (double)o.mu; 

n /= o.a[sat] ; 

n /= o . a [sat] ; 

n /= o.a[sat] ; 

n = sqrt (n) ; 

M = o.mO[sat] + n*timeElapsed; 

//radians + (radians/s) *s , from \protect\vrule widthOpt\protect\href {http : //en. wikipedia. org/wiki/Meai 

double E; //eccentric anomaly 
E = M; 

for (int i=0;i<8;i++) { 
E = M + o.e[sat]*sin(E) ; 

//from \protect\vrule widthOpt\protect\href -[http : //en. wikipedia. org/wiki/Eccentric_anomaly}{http: //en 
} 

double nu; //true anomaly, between -PI and PI 

nu = atan(sqrt((l.+o.e[sat])/(l.-o.e[sat]))*tan(E/2.))*2; 

if (nu<0) nu += 2*M_PI; 

//putting nu between and 2*PI, for theta/phi calculations in keplerianO 

double rho; //distance from major body, in same units as a (metres) 
rho = o.a[sat]*((l.-pow(o.e[sat] ,2) )/ (1 . + (o . e [sat] *cos(nu) ) ) ) ; 

result. first = rho/1000.; //this puts it in km for simplicity's sake 
result . second = nu; 

return result ; 
} 

pair<pair<double , double> , double> 

polarToXYZ (organism o.int sat ,pair<double ,double> polar){ 
pair<double , double> xy; 

pair<pair<double , double> , double> result ; 
double x=0,y=0,z=0; 

double rho, phi, theta, nu; 
rho = polar. fir st; 
nu = polar . second; 

double scaleTheta, scalePhi ; 
scaleTheta = cos (o . inclin[sat] ) ; 
scalePhi = sin(o . inclin [sat] ) ; 

theta = scaleTheta* (o.w [sat] +nu)+o. node [sat] ; //should be to 2*PI 
phi = (M_PI/2) -scalePhi* (o.w [sat] +nu) ; //should be to 2*PI 

//convert to xyz: in same units as rho (metres, now km) 
x = rho*sin(phi)*cos(theta) ; 
y = rho*sin(phi)*sin(theta) ; 
z = rho*cos(phi) ; 

xy. first = x; 
xy. second = y; 
result. first = xy; 
result . second = z; 
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return result ; 
} 

void outputOrganisms(vector<organism>& o) { 
of stream orgs; 

orgs . open ( "X : \\data\\organisms . txt " ) ; 

char timeStr[9] ; 

_strtime_s (timeStr) ; 

orgs << timeStr « endl « endl; 

for (unsigned int i=0; i<o . size() ; i++) { 

orgs « i+1 « "." « o[i] .satellites « endl; 

orgs << o[i]. fitness « endl; 

orgs « o[i].a[0] « "," « o[i].e[0] « "," « o [i] . inclin[0] « endl; 
orgs « o[i].node[0] « "," « o[i].w[0] « "," « o[i].m0[0]; 
orgs << endl « endl; 

} 

orgs . closeO ; 
} 

double rankFitnessXYZ(vector<organism>& o, const int observations) { 
vector<pair<double,double» X (observations) , Y(observations) ,Z (observations) ; 
if stream input; 
of stream best; 
of stream worst; 
string temp; 

input . open(inputXYZ) ; 

while (! input. eof()) { 

getline( input, temp) ; 

if (temp.compare("$$SOE")==0) { 

break; 

} 

} 

//file should be at EOF or at $$S0E here 
int count = 0; 

while ((linput.eof ())&&( input. peek () !='$')) { 

input » X [count] . first ; 

if (!test) get line( input, t emp ) ; 

input » X [count] . second » Y [count] . second >> Z [count] . second; 

getline (input , temp) ; 

Y [count] . first = X [count] . first ; 

Z [count] . first = X [count] . first ; 

count++; 

} 

input . close () ; 
double timelnDays; 
double bestFitness = 0; 
double worstFitness = 1; 
int b,w; 

char timeStr [9] ; 

for (unsigned int i=0; i<o . size () ; i++) { 

//iterating over all the organisms in the population 

o [i] .fitness = ; 

for (int j=0; j observations; j++) { 
timelnDays = X[j] . first; 
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pair<pair<double , double> , double> xyz= 

polarToXYZ(o [i] ,0 ,keplerian(o [i] , , timelnDays) ) ; 
o[i]. fitness += pow(xyz . first . first-X [j] . second, 2) + 
pow(xyz . first . second-Y[j] .second, 2) + pow(xyz . second-Z [j] .second, 2) ; 
} 

o [i] . fitness /= observations; //to normalise 

o[i] .fitness = l/o [i] . fitness ; 

if (o[i] .fitness > bestFitness) { 

bestFitness = o [i] . fitness ; 

b = i; 

} 

if (o[i] .fitness < worstFitness) { 

worstFitness = o [i] . fitness ; 

w = i; 

} 

} 

_strtime_s (timeStr) ; 
of stream genfitness; 

genf itness . open ( "X : \\data\\genf itness . txt " , ios : : app) ; 
genfitness . setf (ios : : scientific) ; 

cout « "Best fitness is " « bestFitness « "At"; 
if (LOG) genfitness « bestFitness « "\t"; 

else genfitness « "Best fitness is " « bestFitness « "At"; 

cout « "Worst is " « worstFitness « "At" « timeStr « endl; 

if (LOG) genfitness « worstFitness « "\t" « timeStr « "\t"; 

else genfitness « "Worst is " << worstFitness « "At" « timeStr << endl; 

genf itness . close () ; 

best . open( "X : \\data\\best . txt " ) ; 
worst . open( "X : \\data\\worst . txt " ) ; 

best << timeStr « endl « "Best fitness = " « bestFitness « endl; 
worst « timeStr « endl « "Worst fitness = " « worstFitness « endl; 
//right now this only writes the first elements 

best « "aAt" « o[b].a[0] « "\neAt" « o[b].e[0] « "\niAt"; 
best « o[b] .inclin[0] « "\nnodeAt" « o[b].node[0] « "\nwAt"; 

best « o[b].w[0] « "\nmOAt" « o[b].m0[0] « endl; 
worst « "aAt" « o[w].a[0] « "\neAt" « o [w] . e [0] « "\niAt"; 
worst « o[w] .inclin[0] « "\nnodeAt" « o[w].node[0] « "\nwAt"; 

worst « o[w].w[0] « "\nmOAt" « o[w].m0[0] « endl; 
best . closeO ; 
worst. close () ; 

return bestFitness; 
} 

pair<organism,organism> straightCrossover (organism ol, organism o2) { 
pair<organism,organism> o; 
o. first = ol; 
o. second = o2; 

double crossoverLevel = 0.25; 

double aR , eR , iR , nodeR , wR , mOR ; //these are randoms between and 1 
aR = (double)rand()/32768; 
eR = (double)rand()/32768; 
iR = (double)rand()/32768; 
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nodeR = (double)rand()/32768; 
wR = (double)rand()/32768; 
mOR = (double) rand 0/32768; 

long long itemp; 
double dtemp; 

if (aR < crossoverLevel) { 

itemp = o . first . a[0] ; 

o. first. a[0] = o . second . a [0] ; 

o . second. a [0] = itemp; 

} 

else if (eR < crossoverLevel) { 
dtemp = o . first . e [0] ; 
o.first.e[0] = o . second . e [0] ; 
o . second. e [0] = dtemp; 
} 

else if (iR < crossoverLevel) { 
dtemp = o . first . e [0] ; 
o.first.e[0] = o . second . e [0] ; 
o. second. e [0] = dtemp; 
} 

else if (nodeR < crossoverLevel) { 

dtemp = o . first . inclin[0] ; 

o .first . inclin [0] = o . second. inclin[0] ; 

o . second. inclin[0] = dtemp; 

} 

else if (wR < crossoverLevel) { 

dtemp = o . first .node [0] ; 

o .first .node [0] = o . second. node [0] ; 

o. second. node [0] = dtemp; 

} 

else if (mOR < crossoverLevel) { 

dtemp = o . first .m0 [0] ; 

o .first .m0 [0] = o . second. mO [0] ; 

o. second. mO [0] = dtemp; 

} 



return o; 
} 

void mutate (or ganismft o) { 
organism mutated; 
double mutateLevel = 0.15; 
mutated. create (o . satellites) ; 

double aR, eR, iR, nodeR, wR,m0R; //these are randoms between and 1 

aR = (double)rand()/32768; 

eR = (double)rand()/32768; 

iR = (double)rand()/32768; 

nodeR = (double)rand()/32768; 

wR = (double)rand()/32768; 

mOR = (double) rand 0/32768; 

double mutatePercent ; 

mutatePercent = (double) rand()/32768/2 - 0.25 + 1; 
//between -.25 + 1 and .25 + 1 = .75 and 1.25 
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if (aR < mutateLevel) o.a[0] *= mutatePercent; 

//a doesn't have strict bounds (besides >0) 
if (eR < mutateLevel) { 
o.e[0] *= mutatePercent; 
while (o.e[0]>=l) o.e[0] — ; 
} 

if (iR < mutateLevel) { 

o . inclin [0] *= mutatePercent ; 

while (o. inclin [0]>=2*M_PI) o. inclin [0] -= 2*M_PI ; 
} 

if (nodeR < mutateLevel) { 
o.node[0] *= mutatePercent; 

while (o.node[0]>=2*M_PI) o.node[0] -= 2*M_PI; 
} 

if (wR < mutateLevel) { 

o.w[0] *= mutatePercent; 

while (o.w[0]>=2*M_PI) o.w[0] -= 2*M_PI; 

} 

if (mOR < mutateLevel) { 
o.m0[0] *= mutatePercent; 

while (o.mO[0]>=2*M_PI) o.m0[0] -= 2*M_PI; 
} 

if (aR < mutateLevel/2) o.a[0] = mutated. a[0] ; 

if (eR < mutateLevel/2) o . e [0] = mutated, e [0] ; 

if (iR < mutateLevel/2) o.inclin[0] = mutated. inclin [0] ; 

if (nodeR < mutateLevel/2) o.node[0] = mutated. node [0] ; 

if (wR < mutateLevel/2) o.w[0] = mutated. w[0] ; 

if (mOR < mutateLevel/2) o.m0[0] = mutated. mO [0] ; 

} 



pair<double,double> keplerian(particle p, double timelnDays) { 
pair<double,double> result; 
result. first=0; 
result . second=0 ; 

double timeElapsed = timelnDays - EPOCH; //in days from EPOCH 
timeElapsed *= 86400; //now in seconds 

double M; //mean anomaly 

double n; //mean motion, in radians/sec 

//unable to represent p.a~3: 

n = (double)p.mu; 

n /= p. a; 

n /= p. a; 

n /= p. a; 

n = sqrt (n) ; 

M = p.mO + n*timeElapsed; 

//radians + (radians/s) *s , from \protect\vrule widthOpt\protect\href {http : //en. wikipedia. org/wiki/Meai 

double E; //eccentric anomaly 
E = M; 
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for (int i=0;i<8;i++) { 
E = M + p.e*sin(E) ; 

/ / from \protect\vrule widthOpt\protect\href {http : //en . wikipedia . org/wiki/Eccentr ic_anomaly}{http : //en 
} 



double mi; //true anomaly, between -PI and PI 
nu = atan(sqrt((l.+p.e)/(l.-p.e))*tan(E/2.))*2; 
if (nu<0) nu += 2*M_PI; 

//putting nu between and 2*PI, for theta/phi calculations in keplerianQ 



double rho; //distance from major body, in same units as a (metres) 
rho = p . a* ( (1 . -pow(p. e , 2) )/ (1 . +(p. e*cos (nu) ) ) ) ; 

result. first = rho/1000.; //this puts it in km for simplicity's sake 
result . second = nu; 

return result ; 
} 

pair<pair<double , double> , double> 

polarToXYZ (particle p, pair<double,double> polar) { 
pair<double , double> xy; 

pair<pair<double , double> , double> result ; 
double x=0,y=0,z=0; 

double rho, phi, theta, nu; 
rho = polar. fir st; 
nu = polar. second; 

double scaleTheta, scalePhi ; 
scaleTheta = cos (p . inclin) ; 
scalePhi = sin (p . inclin) ; 

theta = scaleTheta* (p .w+nu)+p .node ; //should be to 2*PI 
phi = (M_PI/2) -scalePhi* (p. w+nu) ; //should be to 2*PI 

//convert to xyz: in same units as rho (km) 
x = rho*sin(phi)*cos(theta) ; 
y = rho*sin(phi) *sin(theta) ; 
z = rho*cos (phi) ; 

xy. first = x; 
xy. second = y; 
result. first = xy; 
result . second = z; 

return result ; 

} 

void outputSwarm(vector<particle>& p) { 
of stream swarm; 

swarm. openC'X: \\dataWswarm.txt") ; 
char timeStr[9] ; 
_strtime_s (timeStr) ; 
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swarm « timeStr « endl « endl; 

for (unsigned int i=0 ; i<p . size () ; i++) { 

swarm « i+1 << endl; 

swarm « p[i] .fitness « endl; 

swarm « p[i].a « "," « p[i].e « "," « p[i].inclin « endl « p[i].node; 

swarm « "," « p[i].w « "," « p[i].mO « endl « endl; 

} 

swarm. close () ; 
} 

particle rankFitnessXYZ(vector<particle>& p, 
const int observations, particlefe pBest) { 
vector<pair<double,double» X (observations) , Y(observations) ,Z (observations) ; 
if stream input; 
of stream best; 
of stream worst; 
string temp; 

input . open(inputXYZ) ; 

while (! input . eof () ) { 

getline (input , temp) ; 

if (temp.compare("$$SOE")==0) { 

break; 

} 

} 

//file should be at EOF or at $$S0E here 
int count = 0; 

while ((linput.eof ())&&(input.peek() !='$')) { 

input » X [count] . first ; 

if (!test) get line( input, t emp ) ; 

input » X [count] . second » Y [count] . second >> Z [count] . second; 

getline (input, temp) ; 

Y [count] .first = X [count] . first ; 

Z [count] . first = X [count] . first ; 

count++; 

} 

input . close () ; 
double timelnDays; 
double bestFitness = 0; 
double worstFitness = 1; 
int b,w; 

char timeStr[9] ; 

for (unsigned int i=0 ; i<p . size () ; i++) { 

//iterating over all the particles in the swarm 
p[i] .fitness = 0; 

for (int j=0; j observations; j++) { 
timelnDays = X[j]. first; 
pair<pair<double ,double> , double> xyz = 

polarToXYZ(p[i] ,keplerian(p [i] .timelnDays)) ; 
p[i] .fitness += pow(xyz . first . first-X [j] . second, 2) 

+ pow(xyz .first . second-Y [j] . second, 2) + pow(xyz . second-Z [j] . second, 2) ; 
} 

p[i]. fitness /= observations; 
p[i] .fitness = 1/p [i] . fitness ; 
if (p[i] .fitness > bestFitness) { 
bestFitness = p [i] . fitness ; 
b = i; 
} 
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if (p [i] . fitness < worstFitness) {. 

worstFitness = p [i] .fitness ; 

w = i; 

} 

} 

if (bestFitness > pBest . fitness) pBest = p [b] ; 
_strtime_s(timeStr) ; 
of stream genfitness; 
if (LOG) { 

genfitness . open( "X : \\data\\genf itness .txt" , ios: : app) ; 
genfitness . setf (ios : : scientific) ; 
genfitness « bestFitness << "\t"; 

genfitness « worstFitness « "\t" « timeStr « "\t"; 

genfitness. close () ; 

} 

cout « "Best fitness is " « bestFitness « "At"; 

cout « "Worst is " « worstFitness « "At" « timeStr « endl; 

best . open ( "X : \\data\\best . txt " ) ; 

worst . openC'X A\data\\worst .txt") ; 

best << timeStr « endl « "Best fitness = " « bestFitness « endl; 
worst « timeStr « endl « "Worst fitness = " « worstFitness « endl; 
best « "a:\t" « p[b].a « "\neAt" « p[b] .e « "\ni:\t"; 
best « p[b].inclin « "\nnodeAt" « p[b].node « "\nwAt"; 

best « p[b].w « "\nmOAt" « p[b].mO « endl; 
worst « "aAt" « p[w].a « "\ne:\t" « p[w].e « "\niAt"; 
worst « p[w].inclin « "\nnodeAt" « p[w].node « "\nwAt"; 

worst « p[w].w « "\nmOAt" « p[w].mO « endl; 

best << endl << "Best so far = " << pBest . fitness « endl; 
best « "aAt" « pBest.a « "\ne:\t" « pBest . e « "\niAt"; 
best « pBest.inclin « "\nnodeAt" « pBest.node « "\nwAt"; 
best « pBest.w « "\nmOAt" « pBest.mO « endl; 

best . closeO ; 
worst. close () ; 

return p [b] ; //to aid selection process 
} 

int _tmain(int argc, _TCHAR* argv[]) { 

const int maxObservations = 31*12+1; 

//42 for abbrev data sets, 51 for test, 31*12+1 for 2h sets 
const int popSize = 100; 

const double replacementRate = 0.7; 
const double mutationRate = 0.15; 
const int generations = 3000000; 

const bool usePSO = false; //else use GA 

int maxSatellites = 1; 

//in a multiple-satellite implementation, this should be at least the length 
//of the longest x/y vector 

double bestFitness; 
particle pBest,lBest; 
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int j ; 

of stream genfitness; 

/ / genfitness . open ( "X : \\data\\genf itness . txt " ) ; 
//genfitness. closeO ; //clears file contents 

//creates a vector of _observ_'s of size _maxObservations_ 
vector<observ> data(maxObservations) ; 
cout.setf (ios: scientific) ; //or fixed 
srand( (int)time(O) ) ; //randomise 

vector<organism> pop(popSize) ; 
vector<particle> swarm(popSize) ; 
if (usePSO) { 

cout << "Swarm of " << popSize << " created. \n" ; 
for (int i=0 ; KpopSize ; i++) swarm [i] . create () ; 
cout << "Swarm initialised. \n" ; 
} 

else { 

cout << "Organisms created. \n"; 

for (unsigned int i=0 ; i<popSize ; i++) pop [i] . create (maxSatellites) ; 

cout << "Organisms initialised. \n" ; 

} 



pair<double,double> result; 

if (usePSO) result = keplerian (swarm [0] ,2454600) ; 
else result = keplerian (pop [0] , 0, 2454600) ; 

cout«"r : "«result .f irst«endl«"v: "«result . second«endl ; 

pair<pair<double , double> , double> xyz ; 
if (usePSO) xyz = polarToXYZ (swarm [0] .result) ; 
else xyz = polarToXYZ (pop [0] ,0,result) ; 
cout«"x: "«xyz .first . first <<endl ; 
cout«"y : "«xyz .first . second«endl ; 
cout«"z: "«xyz . second<<endl ; 

if (usePSO) { 
j = 0; 

while (j generations) { //and error criteria 
cout « j+1 « ". "; 
if (LOG) { 

genfitness . openC'X: \\data\\genf itness .txt" , ios: : app) ; 
genfitness « j+1 « "\t"; 
genfitness. close () ; 
} 

IBest = rankFitnessXYZ(swarm, data.sizeO, pBest) ; 
if (LOG) { 

genfitness . open( "X : \\data\\genf itness .txt " , ios: : app) ; 
genfitness . setf (ios : : scientific) ; 

double meanFitness =0.; 

for (unsigned int i=0 ; i<swarm. size () ; i++) { 

meanFitness += swarm [i] . fitness ; 

} 
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meanFitness /= swarm. size () ; 

//this is now the correct value of 1/meanFitness 



double stdev = 0.0; 

for (unsigned int i=0 ; i<swarm. size () ; i++) { 
stdev += pow (meanFitness - swarm [i] . fitness , 2) ; 
} 

stdev /= swarm. size() ; 
stdev = sqrt (stdev); 



genfitness « meanFitness << "\t" « stdev « "\n"; 

//file now reads generation [\t] bestFitness [\t] worstFitness [\t] 

//time [\t] meanFitness [\t] stdev [\n] 
genfitness. close () ; 
} 



outputSwarm( swarm) ; 

for (unsigned int i=0 ; i<swarm. size () ; i++) { 

swarm [i] . update (pBest .IBest) ; 

} 

//tapered vMAX enforcement 
if (taper) { 

if (j==5000) for (unsigned int i=0 ; i<swarm. size () ; i++) { 

swarm [i] . newVmax(l . 0) ; 

} 

else if (j==10000) for (unsigned int i=0; i<swarm. size () ; i++) { 

swarm [i] . newVmax (0.75); 

} 

else if (j==15000) for (unsigned int i=0; i<swarm. size () ; i++) { 

swarm [i] .newVmax(0 . 5) ; 

} 

else if (j==20000) for (unsigned int i=0; i<swarm. size () ; i++) { 

swarm [i] . newVmax (0.25); 

} 

else if (j==25000) for (unsigned int i=0 ; i<swarm. size () ; i++) { 

swarm [i] . newVmax (0. 1) ; 

} 

else if (j==30000) for (unsigned int i=0; i<swarm. size () ; i++) { 

swarm [i] . newVmax (0.05); 

} 

} 

} 
} 

else for (j=0; j generations; j++) { //begin GA loop 
genfitness. open( "X : \\data\\genf itness . txt " , ios: : app) ; 
cout << j+1 « ". "; 
if (LOG) genfitness « j+1 « "\t"; 

else genfitness « j+1 << ". "; //inserts generation count, e.g., "1. " 
genfitness . close () ; 

//get fitness of all organisms by measuring SSE of XYZ coords 

bestFitness = rankFitnessXYZ(pop, data. size() ) ; 

double totalFitness =0.; 

double probability; 

double selectRand; 

double mutateRand; 
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int numSelected = 0; 
outputOrganisms(pop) ; 

//evolve: select, crossover, mutate, update, then re-evaluate fitness 
/ / select : 

for (unsigned int i=0 ; i<pop . size () ; i++) { 

totalFitness+=pop[i] .fitness; 

pop [i] . selected = false; 

} 

do { //new roulette wheel selection method 
selectRand = ( (double)rand()/32768) *totalFitness ; 
for (unsigned int i=0 ; i<pop . size () ; i++) { 
//probability = pop[i] .f itness/totalFitness; 
if (!pop[i] .selected) { 
selectRand -= pop [i] . fitness ; 
if (selectRand <= 0) { 
pop [i] . selected = true; 
numSelected++ ; 

totalFitness -= pop [i] . fitness ; 

//exclude this organism from future roulette selections 

break; 

} 

} 

} 

} while (numSelected < replacementRate*pop . size () ) ; 

int numToCross = pop. size () - numSelected; //numToCross must be even 

//copy selected 
vector<organism> popNew; 

for (unsigned int i=0 ; i<pop . size () ; i++) { 

if (pop [i] . selected) popNew.push_back(pop[i] ) ; 

y 

//at this point, popNew contains only selected orgs, not crossed-over ones 
//this is where we output to genfitness the stdev and mean of fitness 
if (LOG) { 

genfitness. openC'X: \\data\\genf itness .txt" , ios: : app) ; 
genfitness . setf (ios : : scientific) ; 

double meanFitness =0.; 

for (unsigned int i=0 ; KpopNew. size (); i++) { 
meanFitness += popNew [i] .fitness; 

> 

meanFitness /= popNew. size () ; 

//this is now the correct value of 1/meanFitness 
double stdev = 0.0; 

for (unsigned int i=0 ; KpopNew. size (); i++) { 

stdev += pow (meanFitness - popNew[i] .fitness, 2) ; 

> 

stdev /= popNew. size () ; 
stdev = sqrt (stdev); 



genfitness « meanFitness « "\t" « stdev « "\t"; 
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//file now reads generation [\t] bestFitness [\t] worstFitness [\t] 

/ /time [\t] meanFitness [\t] stdev [\t] 
genfitness. close () ; 
} 

//crossover in popNew: 

for (int i=0;i<numToCross/2;i++) { 

int tl,t2; 

tl = rand() '/, numSelected; 

//we use numSelected so we're only crossing over old organisms 
t2 = randO '/, numSelected; 

pair<organism,organism> newOrg = straightCrossover(popNew[tl] ,popNew[t2] ) ; 
popNew.push_back(newOrg.f irst) ; 
popNew . push_back (newOrg . second) ; 
} 

//mutate in popNew: 

for (unsigned int i=0 ; i<popNew. size () ; i++) { 

mutateRand = (double)rand()/32768; 

if (mutateRand<mutationRate) mutate (popNew [i] ) ; 

} 

//here output to genfit the std, mean of fitness of the entire crossed-over pop. 
if (LOG) { 

genfitness . open( "X: \\data\\genf itness .txt" , ios: : app) ; 
genfitness . setf (ios : : scientific) ; 

double meanFitness =0.; 

for (unsigned int i=0 ; i<popNew. size () ; i++) { 

meanFitness += popNew [i] .fitness; 

} 

meanFitness /= popNew. size() ; 

//this is now the correct value of 1/meanFitness 

double stdev = 0.0; 

for (unsigned int i=0 ; i<popNew. size () ; i++) { 
stdev += pow (meanFitness - popNew [i] . fitness ,2) ; 

y 

stdev /= popNew. size () ; 
stdev = sqrt (stdev); 

genfitness « meanFitness << "\t" « stdev « endl; 

//file now reads generation [\t] bestFitness [\t] worstFitness [\t] time [\t] 
//meanFitnessSelected [\t] stdevSelected[\t]meanFitnessAll [\t] stdevAll [\n] 
genfitness. close () ; 
} 

//update population: 
pop = popNew; 

} //end loop 

//return organism with highest fitness 



return 0; 
} 



A. 3. PARTICLE. H 



A. 2. organism.h 

// organism.h 

class organism { 
public : 
organism () ; 
double fitness; 
bool selected; 
void create (int); 
int satellites; 

static const unsigned long long mu = 37931187000000000; 
//standard G for jupiter (126686534000000000) in m~3/s~2 
//test = 330531054456064 
//saturn = 37931187000000000 

//note that orbital period P = (2*pi) /sqrt (mu) *a~ (3/2) (in seconds) 
long long a[l] ; 

double e[l] ,inclin[l] ,node[l] ,w[l] ,m0[l] ; 

}; 

organism: : organism () { 
fitness = -1 . ; 
satellites = 0; 
} 

void organism: : create (int maxSize) { //create random organism 
satellites = rand() 7, maxSize + 1; 
double period; 

for (int i=0 ; Ksatellites ; i++) { 
a[0] = (((long long)rand()+l)*8000) ; 

//Io: 25000, Himalia: 700000, test: 6000, Atlas: 8000 
//a should be on the order of 421,800,000 for Io's orbit, 
//a should be on the order of 11,461,000,000 for Himalia's orbit. 
e[0] = ((double)rand()/32768) ; //e should be between and 1 
inclin[0] = ((double)rand()/32768*2*M_PI) ; 
node[0] = ( (double)rand()/32768*2*M_PI) ; 
w[0] = ((double)rand()/32768*2*M_PI) ; 
m0[0] = ((double)rand()/32768*2*M_PI) ; 

> 
} 

A. 3. particle. h 

// particle. h 

class particle { 
public : 

double cl, c2; //learning factors 

double vMAX; //set vMAX below 

double evMAX; //eccentricity maximum velocity 

particleO ; 

void create () ; 

double fitness; 

static const unsigned long long mu = 37931187000000000; 
//standard G for jupiter (126686534000000000) in m~3/s~2 
//test = 330531054456064 
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//saturn = 37931187000000000 

long long a; //a could be unsigned as well 
double e, inclin, node, w, mO; 

long long aV; 

double eV, iV, nV, wV, mOV; 

void update (particle, particle); 
void newVmax (double) ; 

}; 

particle: :particle() { 
vMAX = 1.0; 
evMAX =0.1; 

fitness = -1 . ; 
cl = 2. ; 
c2 = 2. ; 

//particle positions 

a = 0; 

e = 0. ; 

inclin = . ; 

node = . ; 

w = 0. ; 

mO = . ; 

//particle velocities 

aV = 0; 

eV = . ; 

iV = . ; 

nV = . ; 

wV = . ; 

mOV = . ; 

} 

void particle :: create () { 

a = ((long long)rand()+l)*8000; 

e = (double) rand 0/32768; 

inclin = (double ) rand () /32768*2*M_PI ; 

node = (double)rand()/32768*2*M_PI; 

w = (double) rand ()/32768*2*M_PI; 

mO = (double)rand()/32768*2*M_PI; 

} 

void particle : :update (particle pBest, particle IBest) {. 
double delta [8] ; 

delta[0] = pBest. inclin - inclin; 
delta[l] = IBest. inclin - inclin; 
delta [2] = pBest.node - node; 
delta [3] = IBest. node - node; 
delta [4] = pBest.w - w; 
delta[5] = IBest. w - w; 
delta [6] = pBest.mO - mO; 
delta[7] = IBest. mO - mO; 
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for (int i=0;i<8;i++) { 

if (delta [i]>M_PI) delta[i] -= 2*M_PI; 

if ( delta [i]<(-l.*M_PI)) delta [i] += 2*M_PI; 

} 

//this moves the particles randomly -> one parameter but maybe not another 

/*aV += cl* ((double)rand()/32768)*( (double) (pBest. a - a)) 

+ c2*((double)rand()/32768)*((double) (IBest.a - a)); 

eV += cl*((double)rand()/32768)*(pBest.e - e) 

+ c2*((double)rand()/32768)*(LBest.e - e) ; 

iV += cl*((double)rand()/32768)*delta[0] 

+ c2*((double)rand()/32768)*delta[l] ; 

nV += cl*((double)rand()/32768)*delta[2] 

+ c2*( (double) rand ()/32768)*delta [3] ; 

wV += cl*((double)rand()/32768)*delta[4] 

+ c2*((double)rand()/32768)*delta[5] ; 

mOV += cl*((double)rand()/32768)*delta[6] 

+ c2*( (double) rand ()/32768)*delta [7] ;*/ 

//this treats the entire vector as one: particles move towards best values 

//across all parameters simultaneously 
double randl , rand2 ; 
randl = (double)rand()/32768; 
rand2 = (double)rand()/32768; 

aV += cl*randl* ( (double) (pBest. a - a) ) + c2*rand2* ( (double) (IBest . a - a)); 

eV += cl*randl*(pBest.e - e) + c2*rand2* (IBest . e - e) ; 

iV += cl*randl*delta[0] + c2*rand2*delta [1] ; 

nV += cl*randl*delta[2] + c2*rand2*delta [3] ; 

wV += cl*randl*delta[4] + c2*rand2*delta [5] ; 

mOV += cl*randl*delta[6] + c2*rand2*delta[7] ; 

eV = min(eV, (double) evMAX) ; 

eV = max(eV, (double) (evMAX* (-1) ) ) ; 

iV = minUV, (double) vMAX) ; 

iV = max(iV, (double) (vMAX*(-l))) ; 

nV = min(nV, (double) vMAX) ; 

nV = max(nV, (double) (vMAX*(-l))) ; 

wV = min(wV, (double )vMAX) ; 

wV = max(wV, (double) (vMAX*(-l))) ; 

mOV = min(mOV, (double )vMAX) ; 

mOV = max (mOV, (double) (vMAX*(-l))) ; 



a += aV; 

a = max(a,(long long)l); 
e += eV; 

e = min(e, 0.999999) ; 
e = max(e,0. ) ; 

inclin += iV; 

while (inclin >= 2*M_PI) inclin -= 2*M_PI; 
while (inclin < 0) inclin += 2*M_PI; 

node += nV; 

while (node >= 2*M_PI) node -= 2*M_PI; 
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while (node < 0) node += 2*M_PI; 
w += wV; 

while (w >= 2*M_PI) w -= 2*M_PI ; 
while (w < 0) w += 2*M_PI; 

mO += mOV; 

while (mO >= 2*M_PI) mO -= 2*M_PI; 

while (mO < 0) mO += 2*M_PI; 

} 

void particle: :newVmax (double newV) { 

vMAX = newV; 

} 



APPENDIX B 



Source Code: MATLAB Scripts 

B.l. ANNz Script 

'/, part2annz.m 

function [a b c] = part2annz(f ile) 
•/.for ANNz 

fid = fopen(file, 'rt'); 
data = fscanf(fid, >°/„f 7„f If, [3 inf ] ) ; 
f close(f id) ; 

data = data' ; 

count = size (data, 1) ; 

for i = 1 : count 

data(i,4) = (data(i , 1) -data(i , 2) ) ~2 ; 

end; 

rmse = 0; 
dell = 0; 
del2 = 0; 
del3 = 0; 
for i = 1 : count 

rmse = rmse + data(i,4); 
if data(i,4) <= .01, dell 
if data(i,4) <= .04, del2 
if data(i,4) <= .09, del3 

end; 

rmse = rmse/count; 
rmse = sqrt(rmse), 

dell = dell/count, 
del2 = del2/count, 
del3 = del3/count, 

rmsnoise = sqrt (sum(data( : ,3) . *data( : ,3) )/count) ; 

errors(row, : ) = [rmsnoise rmse dell del2 del3] ; 
row = row + 1 ; 

close all; 

figure (> Position ' , [20 540 560 420]); 

a = plot(data(: ,1) ,data(: ,2) , 'o' , 'MarkerSize' ,2) ;line([0 6] , [0 6] , ' Color ' , 'r ' ) ; 
figure ( 'Position' , [620 60 560 420]); 



= dell + 1; end; 
= del2 + 1; end; 
= del3 + 1; end; 
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b = plot(data( : , 1) ,data( : ,3) , 'o' , 'MarkerSize' ,2) ; 
figure('Position' , [20 60 560 420]); 

c = plot(data(: ,1) ,abs(data(: ,2)-data(: ,1)) , 'o> , 'MarkerSize' ,2) ; 

B.2. MLP/Netlab Script 

'/, part2mlp.m 



act = ' linear ' ; 
nout = 1 ; 

comSize = 5; 7. committee size 
nhidden = 10; 
iter = 500; 



[x4,t4] = datread( 'x:\quasartrain-UGRI.dat' ,4,1,37136) ; 
[x4t,t4t] = datread( 'x:\quasartest-UGRI.dat' ,4,1,9284) ; 
[x5,t5] = datread( 'x: \quasartrain-ugriz . dat ' ,5 , 1 , 37136) ; 
[x5t,t5t] = datread( 'x : \quasartest-ugriz .dat ' , 5 , 1 ,9284) ; 
[x6,t6] = datread( 'x:\quasartrain-ugriz6192. dat' ,5,1,4954) ; 
[x6t,t6t] = datread('x:\quasartest-ugriz6192. dat' ,5,1,1238) ; 

[x7,t7] = datread( 'x:\quasartrain-UGRIZJH. dat' ,7,1,4954) ; 
[x7t,t7t] = datread( 'x:\quasartest-UGRIZJH. dat' ,7,1,1238) ; 
[x8,t8] = datread( 'x: \quasartrain-ugrizjhk. dat' ,8,1,4954) ; 
[x8t,t8t] = datread('x:\quasartest-ugrizjhk. dat' ,8,1,1238) ; 

[x9,t9] = datread( 'x:\quasartrain-ugrizUGRI. dat' ,9,1,37136) ; 
[x9t,t9t] = datread( 'x:\quasartest-ugrizUGRI. dat' ,9,1,9284) ; 

[xOl.tOl] = datread( 'x:\trainz01. dat' ,4,1,10159) ; 
[xOlt.tOlt] = datread( 'x:\testz01. dat' ,4,1,2540) ; 
[xl2,tl2] = datread( 'x:\trainzl2. dat' ,4,1,19166) ; 
[xl2t,tl2t] = datread('x:\testzl2. dat' ,4,1,4792) ; 
[x2,t2] = datread( 'x:\tr ainz2.dat' ,4,1,7810) ; 
[x2t,t2t] = datread( 'x:\testz2. dat' ,4,1,1953) ; 
[x02,t02] = datread( 'x:\trainz02.dat' ,4,1,29325) ; 
[x02t,t02t] = datread('x:\testz02. dat' ,4,1,7332) ; 

trainx = {x4, x5, x6, x7, x8, x9, xOl, xl2, x2, x02>; 

traint = {t4, t5, t6, t7, t8, t9, tOl, tl2, t2, t02>; 

testx = {x4t, x5t, x6t, x7t , x8t , x9t, xOlt, xl2t, x2t , x02t}; 

testt = {t4t, t5t, t6t, t7t, t8t, t9t, tOlt, tl2t, t2t, t02t}; 



clear skip; 

skip (10, comSize) = 0; 
skipActive = 0; 

7. if skipActive = 1 but skipO () is all Os, this retrains the MLPs by iter 



'/, skipActive = 1; 



7. skip(l, 


) = l; 


7. skip (2, 


) = l; 


7. skip (3, 


) = l; 


7, skip (4, 


) = l; 


7. skip (5, 


) = l; 


7. skip (6, 


) = l; 


7. skip (7, 


) = l; 


7. skip (8, 


) = l; 


7. skip (9, 


) = l; 
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% skipClO,:) = 1; 

if skipActive==l , savemlps=mlps ; end; 
if skipActive==0 , mips = cell (6 , comSize) ; end; 
y = cell(6,comSize) ; 
rms (6 , comSize) = 0; 
comOutput = cell (6); 
comRMS(6) = 0; 
meanRMS(6) = 0; 

if skipActive==0 

for j = 1 : size (mips ,2) 
mlps-(l,j} = mlp(4, 
mlps-[2,j} = mlp(5, 
mlps{3,j} = mlp(5, 
mlps{4,j} = mlp(7, 
mlps{5,j} = mlp(8, 
mlps-[6,j} = mlp(9, 
mlps-[7,j} = mlp(4, 
mlps{8,j} = mlp(4, 
mlps-[9,j} = mlp(4, 
mlps-ClO.j}- = mlp(4 

end; 

end; 

for i = 1 : size (mips , 1) 

for j = 1 : size (mips ,2) 
if skip(i,j)==0 

mlps{i,j} = mlptrain(mlps{i , j}, trainx{i}, traint{i}, iter); 

end; 

y{i,j} = mlpf wd(mlps{i , j}, testx{i}) ; 

rms(i,j) = sqrt (mean( (y{i , j}— testt{i}) . * (y{i , j}-testt-[i}) ) ) ; 
fprintfd, 'MLP 7,d, member °/,d finished. \n' , i, j); 

end; 

comOutput{i} = 0; 

for j = l:comSize, comOutput{i} = comOutput{i} + y{i,j}; end; 
comOutput{i} = comOutput{i}- / comSize; 

comRMS(i)=sqrt (mean ( (comOutput {i}-testt{i}) . * (comOutput {i}—testt{i} ) ) ) ; 
meanRMS(i) = mean(rms (i , 1 : size (mips ,2) ) ) ; 
fprintfd, 'Finished with MLP committee °/,d.\n', i) ; 

end; 

saverms = [rms comRMS' meanRMS'] ; 
savemlps = mips; 

B.3. RBFN Script 

'/. part2rbf.m 

act = 'tps ' ; 
nout = 1 ; 

comSize = 5; 7, committee size 
nhidden = 20; 
options = f opt ions; 

options(l) = 25; '/, display errors/how often? 
options (14) = 100; number of iterations 



nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 



, nhidden, nout, act); 
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[x4,t4] = datreadC 'x:\quasartrain-UGRI.dat' ,4,1,37136) ; 
[x4t,t4t] = datread ('x:\quasartest-UGRI.dat' ,4,1,9284) ; 
[x5,t5] = datread ('x:\quasartrain-ugriz.dat' ,5,1,37136) ; 
[x5t,t5t] = datreadC 'x : \quasartest-ugriz .dat ', 5 , 1 ,9284) ; 
[x6,t6] = datread('x:\quasartrain-ugriz6192. dat' ,5,1,4954) ; 
[x6t,t6t] = datread('x:\quasartest-ugriz6192.dat' ,5,1,1238) ; 



[x7,t7] = datread ('x:\quasartrain-UGRIZJH. dat' ,7,1,4954) ; 
[x7t,t7t] = datreadC 'x:\quasartest-UGRIZJH.dat' ,7,1,1238); 
[x8,t8] = datreadC 'x: \quasartrain-ugrizjhk. dat ' ,8 , 1 ,4954) ; 
[x8t,t8t] = datreadC 'x : \quasartest-ugrizjhk . dat ', 8, 1 , 1238) ; 

[x9,t9] = datreadC 'x:\quasartrain-ugrizUGRI. dat' ,9,1,37136) ; 
[x9t,t9t] = datreadC 'x:\quasartest-ugrizUGRI. dat' ,9,1,9284) ; 

trainx = {x4, x5, x6, x7, x8, x9}; 

traint = {t4, t5, t6, t7, t8, t9>; 

testx = {x4t, x5t, x6t, x7t , x8t , x9t}; 

testt = {t4t, t5t, t6t, t7t, t8t, t9t}; 



clear skip; 

skip (6, comSize) = 0; 

skipActive = 0; 



skipActive = 1; 

skipCl, :) = 1; 

skipC2, :) = 1; 

skipC3, :) = 1; 

skipC4, :) = 1; 

skip (5, : ) = 1 ; 

skip (6, :) = 1; 



if skipActive==l , saverbf s=rbf s ; end; 

if skipActive==0 , rbfs = cell(6,comSize) ; end; 

a = cell(6,comSize) ; 

rms(6,comSize) = 0; 

comOutput = cell (6); 

comRMS(6) = 0; 

meanRMS(6) = 0; 



if skipActive==0 

for j = 1 : size(rbf s , 2) 
rbfs{l,j} = rbf(4, 
rbfs{2,j} = rbf(5, 
rbfs{3,j} = rbf(5, 
rbfs{4,j} = rbf (7, 
rbfs{5,j} = rbf (8, 
rbfs{6,j} = rbf (9, 

end; 

end; 



nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 


nhidden, 


nout , 


act) 



for i = 1 : size (rbf s , 1) 

for j = 1 : size(rbf s , 2) 
if skip(i,j)==0 

rbfs{i,j> = rbf trainCrbf s{i , j}, options, trainx{i>, traint{i}) ; 

end; 
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a{i,j} = rbffwd(rbf s{i, j}, testx{i}) ; 

rms(i,j) = sqrt (mean( (a{i , j}— testt{i}) . * (a{i , j}- testt-Ci}) ) ) ; 
fprintfd, 'RBF °/,d, member °/,d finished. \n' , i, j); 

end; 

comOutput{i} = 0; 

for j = l:comSize, comOutput{i} = comOutput{i} + a-[i,j]-; end; 
comOutput{i} = comOutput{i}- / comSize; 

comRMS (i) =sqrt (mean( (comOutput{i}— testt{i}) . * (comOutput{i}— testt{i}) ) ) ; 
meanRMS(i) = mean(rms (i , 1 : size (rbf s ,2) ) ) ; 
fprintfd, 'Finished with RBF committee °/,d.\n', i) ; 

end; 



saverms = [rms comRMS' meanRMS']; 
saverbfs = rbfs; 
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